Site Reliability Engineer 23 Cloud (5 to 12 Years)

PhonePe

Not Interested
Bookmark
Report This Job

profile Job Location:

Bengaluru - India

profile Monthly Salary: Not Disclosed
Posted on: 2 days ago
Vacancies: 1 Vacancy

Job Summary

About PhonePe Limited:

Headquartered in India its flagship product the PhonePe digital payments app was launched in Aug 2016. As of April 2025 PhonePe has over 60 Crore (600 Million) registered users and a digital payments acceptance network spread across over 4 Crore (40 million) merchants. PhonePe also processes over 33 Crore (330 Million) transactions daily with an Annualized Total Payment Value (TPV) of over INR 150 lakh crore.

PhonePes portfolio of businesses includes the distribution of financial products (InsuranceLending and Wealth) as well as new consumer tech businesses (Pincode - hyperlocal e-commerce and Indus AppStore Localized App Store for the Android ecosystem) in Indiawhich are aligned with the companys vision to offer every Indian an equal opportunity toaccelerate their progress by unlocking the flow of money and access to services.

Culture:

At PhonePe we go the extra mile to make sure you can bring your best self to work Everyday!. And that starts with creating the right environment for you. We empower people and trust them to do the right thing. Here you own your work from start to finish right from day one. PhonePe-rs solve complex problems and execute quickly; often building frameworks from scratch. If youre excited by the idea of building platforms that touch millions ideating with some of the best minds in the country and executing on your dreams with purpose and speed join us!

Job Summary

We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) with 5 to 12 years of experience to manage scale and ensure the high availability of our core infrastructure. This role is open to experts specialized in either Microsoft Azure or AWS. You will be responsible for deep-level cloud architecture automation and complex networking to support a high-volume mission-critical environment where downtime is not an option.

Key Responsibilities

Cloud & Infrastructure Management

  • Cloud Operations: Configure maintain and manage Ubuntu/Linux Virtual Machines in your primary cloud environment (Azure or AWS).
  • Managed Services: Design and manage cloud-native components for log storage database management and alerting (e.g. Azure Storage/ADX or AWS S3/CloudWatch).

Networking & Connectivity

  • Complex Networking: Configure and maintain critical network components including Firewalls Route Tables and Virtual Gateways (VPC/VNet).
  • Hybrid Links: Establish and manage high-speed connectivity via Express Route (Azure) or Direct Connect (AWS) along with IPsec VPNs for external environments.
  • Troubleshooting: Resolve complex routing issues and manage network migrations with zero-to-minimal downtime.

Automation & Infrastructure as Code (IaC)

  • Everything as Code: Drive automation for all BAU (Business As Usual) tasks using Terraform writing new code for all infrastructure components.
  • Config Management: Use Saltstack or Ansible for automated deployment and configuration of services on VMs.
  • Tooling: Develop custom scripts or services in Python Go or Java to eliminate manual toil.

Database & Data Management

  • High Availability: Set up and manage HA services like MySQL and Aerospike.
  • Global Replication: Implement database replication across regions manage migrations and ensure data synchronization during network partitions.
  • Data Protection: Handle robust backup strategies for databases logs and system configurations.

Monitoring & Observability

  • Modern Stack: Implement and manage monitoring systems like Prometheus Victoria Metrics or Riemann.
  • Logging & Viz: Proficiency with Loki for centralized logging and Grafana for building mission-critical dashboards and alerting.

Required Technical Expertise

Cloud Platform (Azure OR AWS)

  • Core Services: Deep hands-on experience with either Azure (VMs Storage Accounts CosmosDB ADX) or AWS (EC2 S3 RDS).
  • Security: Integrate platform and VM-level services with the SOC; collaborate with Infosec to fix vulnerabilities.

Operating Systems & Middleware

  • OS: Expert proficiency in Linux (Ubuntu) for system administration and kernel-level performance troubleshooting.
  • Web/Proxy: Expert management of Nginx and HAProxy (proxy management endpoint addition and complex rewrite rules).
  • Messaging: Experience with RabbitMQ (RMQ) and containerization using Docker.

Networking Protocols

  • Deep Knowledge: Mastery of DNS BGP routing and private connectivity troubleshooting.

Essential Soft Skills & Qualifications

  • Experience: 5 to 12 years in an SRE or high-level DevOps role.
  • Ownership: A proactive approach to identifying and solving infrastructure challenges before they impact users.
  • Incident Management: Ability to lead incident response create Root Cause Analysis (RCA) documents and manage post-mortems.
  • SRE Principles: Experience defining SLOs/SLIs and a commitment to Toil Reduction through automation.

Cost Optimization: Proven ability to identify and implement cloud resource optimization to save costs.

PhonePe Full Time Employee Benefits (Not applicable for Intern or Contract Roles)

  • Insurance Benefits - Medical Insurance Critical Illness Insurance Accidental Insurance Life Insurance
  • Wellness Program - Employee Assistance Program Onsite Medical Center Emergency Support System
  • Parental Support - Maternity Benefit Paternity Benefit Program Adoption Assistance Program Day-care Support Program
  • Mobility Benefits - Relocation benefits Transfer Support Policy Travel Policy
  • Retirement Benefits - Employee PF Contribution Flexible PF Contribution Gratuity NPS Leave Encashment
  • Other Benefits - Higher Education Assistance Car Lease Salary Advance Policy

Our inclusive culture promotes individual expression creativity innovation and achievement and in turn helps us better understand and serve our customers. We see ourselves as a place for intellectual curiosity ideas and debates where diverse perspectives lead to deeper understanding and better quality results. PhonePe is an equal opportunity employer and is committed to treating all its employees and job applicants equally; regardless of gender sexual preference religion race color or disability. If you have a disability or special need that requires assistance or reasonable accommodation during the application and hiring process including support for the interview or onboarding process please fill out this form.

Read more about PhonePe on our blog.

Life at PhonePe

PhonePe in the news


Required Experience:

IC

About PhonePe Limited:Headquartered in India its flagship product the PhonePe digital payments app was launched in Aug 2016. As of April 2025 PhonePe has over 60 Crore (600 Million) registered users and a digital payments acceptance network spread across over 4 Crore (40 million) merchants. PhonePe ...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting

About Company

Company Logo

PhonePe is a Digital Wallet & Online Payment App that allows you to make instant Money Transfers with UPI. Recharge Mobile, DTH, Pay Utility Bills, Buy/Invest in Gold, Mutual Funds, Insurance & much more.

View Profile View Profile