Staff Site Reliability Engineer

Visa

Job Location:

São Paulo - Brazil

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

The Platform Engineering team within the SRE Tribe is responsible for owning and evolving the containerized platform that underpins critical and missioncritical workloads. The team focuses on building reliable resilient and scalable cloudnative infrastructure enabling engineering teams to deploy and operate services efficiently and safely. By applying strong SRE principles automation and Infrastructure as Code practices the platform ensures high availability operational excellence and longterm sustainability of core systems.

What Youll Do

Own the endtoend lifecycle (design provisioning upgrades and decommissioning) of core platform components including cloud infrastructure primitives Kubernetes clusters networking ingress service discovery service mesh and dataplane components.
Design build and evolve a highly reliable and resilient containerized platform supporting critical workloads applying SRE and cloudnative best practices.
Lead the design and implementation of infrastructure bootstrap orchestration enabling deterministic repeatable platform bringup and teardown across cloud network and Kubernetes layers.
Drive a strong InfrastructureasCode and GitOpsfirst approach ensuring platform components are reproducible auditable automated testable and reversible.
Identify and close automation gaps leading initiatives that significantly reduce manual effort onboarding time and operational risk at scale.
Apply and promote SRE principles such as fault isolation graceful degradation capacity planning saturation control and clear failure modes across the platform.
Continuously assess platform reliability risks and proactively improve stability resilience and operational readiness.
Act as a technical reference and escalation point for platform reliability participating in oncall rotations incident response postincident reviews and problem management.
Improve platform operability by simplifying day2 operations standardizing upgrade and rollback strategies and reducing Mean Time to Detect (MTTD) and Mean Time to Recover (MTTR).
Ensure platform operations align with security compliance and internal control requirements.
Collaborate closely with crossfunctional engineering teams influencing technical decisions and promoting best practices through handson contributions and technical leadership.
Contribute to architectural and technical discussions supporting continuous improvement and longterm platform evolution.
Stay up to date with emerging technologies SRE practices and cloudnative patterns sharing insights at squad and collective levels.
Be recognized for delivering highimpact highquality platform and reliability solutions across the organization.

This is a remote position. A remote position does not require job duties be performed within proximity of a Visa office location. Remote positions may be required to be present at a Visa office with scheduled notice. #LIRemote

Qualifications :

For this role you must be based in Brazil

Language Skills

Proficiency in English at B1 level or above (Intermediate)

Technical Skills

Strong handson experience with public cloud platforms preferably AWS (experience with Azure is also valued).
Deep experience operating Kubernetes at scale (EKS or equivalent) including cluster lifecycle management and cluster services.
Strong expertise with Service Mesh technologies preferably Istio with familiarity in alternatives such as App Mesh or Linkerd.
Advanced knowledge of Infrastructure as Code practices using tools such as Terraform.
Solid understanding of cloudnative containerized microservices architectures.
Strong experience with observability concepts and tooling including logs metrics traces alerts and Golden Signals.
Proven ability to operate debug and troubleshoot complex distributed systems.
Strong understanding of incident management oncall operations and reliability engineering practices.
Experience designing reliable automated infrastructure with minimal manual intervention.
Excellent collaboration and communication skills with the ability to influence across teams and act as a technical reference.

Preferred Qualifications

Proven background in SRE or Platform Engineering roles especially in senior or stafflevel individual contributor positions.
Experience working with critical or missioncritical systems.
Experience leading largescale automation and infrastructure bootstrap initiatives.
Familiarity with security compliance and operational controls in cloud environments.
Experience in highly regulated industries or largescale production platforms.
5 or more years of work experience with a Bachelors Degree or more than 2 years of work experience with an Advanced Degree (e.g. Masters MBA).

Additional Information :

Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race color religion sex national origin sexual orientation gender identity disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.

Remote Work :

Yes

Employment Type :

Full-time

What Youll Do

Own the endtoend lifecycle (design provisioning upgrades and decommissioning) of core platform components including cloud infrastructure primitives Kubernetes clusters networking ingress service discovery service mesh and dataplane components.
Design build and evolve a highly reliable and resilient containerized platform supporting critical workloads applying SRE and cloudnative best practices.
Lead the design and implementation of infrastructure bootstrap orchestration enabling deterministic repeatable platform bringup and teardown across cloud network and Kubernetes layers.
Drive a strong InfrastructureasCode and GitOpsfirst approach ensuring platform components are reproducible auditable automated testable and reversible.
Identify and close automation gaps leading initiatives that significantly reduce manual effort onboarding time and operational risk at scale.
Apply and promote SRE principles such as fault isolation graceful degradation capacity planning saturation control and clear failure modes across the platform.
Continuously assess platform reliability risks and proactively improve stability resilience and operational readiness.
Act as a technical reference and escalation point for platform reliability participating in oncall rotations incident response postincident reviews and problem management.
Improve platform operability by simplifying day2 operations standardizing upgrade and rollback strategies and reducing Mean Time to Detect (MTTD) and Mean Time to Recover (MTTR).
Ensure platform operations align with security compliance and internal control requirements.
Collaborate closely with crossfunctional engineering teams influencing technical decisions and promoting best practices through handson contributions and technical leadership.
Contribute to architectural and technical discussions supporting continuous improvement and longterm platform evolution.
Stay up to date with emerging technologies SRE practices and cloudnative patterns sharing insights at squad and collective levels.
Be recognized for delivering highimpact highquality platform and reliability solutions across the organization.

Qualifications :

For this role you must be based in Brazil

Language Skills

Proficiency in English at B1 level or above (Intermediate)

Technical Skills

Strong handson experience with public cloud platforms preferably AWS (experience with Azure is also valued).
Deep experience operating Kubernetes at scale (EKS or equivalent) including cluster lifecycle management and cluster services.
Strong expertise with Service Mesh technologies preferably Istio with familiarity in alternatives such as App Mesh or Linkerd.
Advanced knowledge of Infrastructure as Code practices using tools such as Terraform.
Solid understanding of cloudnative containerized microservices architectures.
Strong experience with observability concepts and tooling including logs metrics traces alerts and Golden Signals.
Proven ability to operate debug and troubleshoot complex distributed systems.
Strong understanding of incident management oncall operations and reliability engineering practices.
Experience designing reliable automated infrastructure with minimal manual intervention.
Excellent collaboration and communication skills with the ability to influence across teams and act as a technical reference.

Preferred Qualifications

Proven background in SRE or Platform Engineering roles especially in senior or stafflevel individual contributor positions.
Experience working with critical or missioncritical systems.
Experience leading largescale automation and infrastructure bootstrap initiatives.
Familiarity with security compliance and operational controls in cloud environments.
Experience in highly regulated industries or largescale production platforms.
5 or more years of work experience with a Bachelors Degree or more than 2 years of work experience with an Advanced Degree (e.g. Masters MBA).

Additional Information :

Remote Work :

Yes

Employment Type :

Full-time

Apply Now

About Company

Visa

Visa (NYSE: V) is a world leader in digital payments, facilitating transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories. Our purpose is to uplift everyone, everywhere by being the best way to pay and b ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click