Manager Site Reliability Engineering Python, Terraform, CICD, Observability

Bengaluru - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

The SRE Manager leads a team of Site Reliability Engineers responsible for delivering high availability security and performance across multiple Products within the Value-Added Services organization. This role balances hands on technical oversight with people leadership guiding team execution in incident response automation observability environment management and operations. The Manager champions consistency operational maturity and the use of Generative AI and automation to reduce toil and strengthen reliability engineering practices. Acting as a cross functional partner this leader will collaborate with product engineering and operations teams to implement resilient designs scalable processes and continuous improvement across production systems.

Responsibilities:

Lead the delivery of secure reliable and high-performing application services across distributed and hybrid environments.
Improve operational excellence engineering discipline and team execution through coaching prioritization and consistent process reinforcement.
Drive zerodowntime reliability with proactive monitoring structured incident response and rigorous rootcause remediation.
Oversee full environment management lifecycle: deployment governance configuration updates operational readiness assessments and risk evaluation.
Foster an inclusive collaborative and highaccountability culture focused on continuous learning and teamwide development.
Build strong relationships with engineering product architecture and operations to align on service priorities and longterm reliability goals.
Communicate effectively with technical and nontechnical audiences providing frameworks for decisionmaking and problemsolving.
Champion automation and Generative AI tooling to reduce manual processes eliminate toil and scale operational capabilities.
Lead cloud and hybrid infrastructure adoption initiatives with a focus on resilience and minimal downtime.
Facilitate incident bridges coordinate crossteam collaboration and ensure proper escalation paths for critical issues.
Proactively communicate operational insights risks and status updates to crossfunctional stakeholders and PRE leadership.
Ensure the SRE team consistently delivers secure stable and efficient infrastructure aligned with business and engineering objectives.
Establish and track key SRE performance indicators (SLOs error budgets operational KPIs).
Drive growth upskilling and performance development across the SRE team supporting engineers at multiple experience levels.

This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager.

Qualifications :

Basic Qualifications:

8-11 years of relevant experience in SRE Systems Engineering or Software Engineering.
2-4 years of experience leading engineers (people leadership or technical leadership).
Demonstrated ability to manage and prioritize team execution across incidents change management operations and automation.
Strong understanding of distributed systems on prem and cloud architectures microservices containers and API ecosystems.
Proven ability to drive troubleshooting RCA and performance improvements.
Familiarity with Linux/Unix systems CI/CD workflows networking fundamentals and observability practices.
Ability to communicate complex technical topics to senior leadership crossfunctional stakeholders and nontechnical audiences.
Proven ability to build team capability through mentorship feedback and performance coaching.
Experience driving the adoption of automation and/or Generative AI to improve operational efficiency.
Experience supporting or leading 24x7 operations and oncall programs.

Preferred Qualifications:

Hands on experience with Java/J2EE REST/SOAP architectures and distributed services.
Direct experience supporting containerized applications and cloud platforms (AWS GCP).
Expertise in Linux Jenkins Java/.NET applications relational DBs Tomcat and Apache.
Proficiency in scripting and automation (Bash Python JavaScript etc.).
Strong knowledge of infrastructure components (Linux VMs MQ storage).
Understanding of Generative AI and operational applications.
Experience building tools and automation to streamline production support.
Solid understanding of observability platforms and best practices.

Additional Information :

Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race color religion sex national origin sexual orientation gender identity disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.

Remote Work :

Employment Type :

Full-time

Responsibilities:

Lead the delivery of secure reliable and high-performing application services across distributed and hybrid environments.
Improve operational excellence engineering discipline and team execution through coaching prioritization and consistent process reinforcement.
Drive zerodowntime reliability with proactive monitoring structured incident response and rigorous rootcause remediation.
Oversee full environment management lifecycle: deployment governance configuration updates operational readiness assessments and risk evaluation.
Foster an inclusive collaborative and highaccountability culture focused on continuous learning and teamwide development.
Build strong relationships with engineering product architecture and operations to align on service priorities and longterm reliability goals.
Communicate effectively with technical and nontechnical audiences providing frameworks for decisionmaking and problemsolving.
Champion automation and Generative AI tooling to reduce manual processes eliminate toil and scale operational capabilities.
Lead cloud and hybrid infrastructure adoption initiatives with a focus on resilience and minimal downtime.
Facilitate incident bridges coordinate crossteam collaboration and ensure proper escalation paths for critical issues.
Proactively communicate operational insights risks and status updates to crossfunctional stakeholders and PRE leadership.
Ensure the SRE team consistently delivers secure stable and efficient infrastructure aligned with business and engineering objectives.
Establish and track key SRE performance indicators (SLOs error budgets operational KPIs).
Drive growth upskilling and performance development across the SRE team supporting engineers at multiple experience levels.

This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager.

Qualifications :

Basic Qualifications:

8-11 years of relevant experience in SRE Systems Engineering or Software Engineering.
2-4 years of experience leading engineers (people leadership or technical leadership).
Demonstrated ability to manage and prioritize team execution across incidents change management operations and automation.
Strong understanding of distributed systems on prem and cloud architectures microservices containers and API ecosystems.
Proven ability to drive troubleshooting RCA and performance improvements.
Familiarity with Linux/Unix systems CI/CD workflows networking fundamentals and observability practices.
Ability to communicate complex technical topics to senior leadership crossfunctional stakeholders and nontechnical audiences.
Proven ability to build team capability through mentorship feedback and performance coaching.
Experience driving the adoption of automation and/or Generative AI to improve operational efficiency.
Experience supporting or leading 24x7 operations and oncall programs.

Preferred Qualifications:

Hands on experience with Java/J2EE REST/SOAP architectures and distributed services.
Direct experience supporting containerized applications and cloud platforms (AWS GCP).
Expertise in Linux Jenkins Java/.NET applications relational DBs Tomcat and Apache.
Proficiency in scripting and automation (Bash Python JavaScript etc.).
Strong knowledge of infrastructure components (Linux VMs MQ storage).
Understanding of Generative AI and operational applications.
Experience building tools and automation to streamline production support.
Solid understanding of observability platforms and best practices.

Additional Information :

Remote Work :

Employment Type :

Full-time

Key Skills

Apply Now

About Company

Visa

Visa (NYSE: V) is a world leader in digital payments, facilitating transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories. Our purpose is to uplift everyone, everywhere by being the best way to pay and b ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click