drjobs Site Reliability Engineer - Systems (7 to 10 Years)

Site Reliability Engineer - Systems (7 to 10 Years)

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Bengaluru - India

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

About PhonePe Limited:

Headquartered in India its flagship product the PhonePe digital payments app was launched in Aug 2016. As of April 2025 PhonePe has over 60 Crore (600 Million) registered users and a digital payments acceptance network spread across over 4 Crore (40 million) merchants. PhonePe also processes over 33 Crore (330 Million) transactions daily with an Annualized Total Payment Value (TPV) of over INR 150 lakh crore.

PhonePes portfolio of businesses includes the distribution of financial products (InsuranceLending and Wealth) as well as new consumer tech businesses (Pincode - hyperlocal e-commerce and Indus AppStore Localized App Store for the Android ecosystem) in Indiawhich are aligned with the companys vision to offer every Indian an equal opportunity toaccelerate their progress by unlocking the flow of money and access to services.

Culture:

At PhonePe we go the extra mile to make sure you can bring your best self to work Everyday!. And that starts with creating the right environment for you. We empower people and trust them to do the right thing. Here you own your work from start to finish right from day one. PhonePe-rs solve complex problems and execute quickly; often building frameworks from scratch. If youre excited by the idea of building platforms that touch millions ideating with some of the best minds in the country and executing on your dreams with purpose and speed join us!

Site Reliability Engineer - System

Expeience: 7 to 10 Years

Summary

We are seeking a skilled and proactive Site Reliability Engineer (SRE) to join ourteam. The ideal candidate will have extensive experience in Linux systemsadministration understanding of database management and a proven trackrecord of troubleshooting complex system-level issues. You will be responsiblefor ensuring the reliability performance and scalability of our productionenvironments balancing system and database stability through robustmonitoring debugging and automation practices.

Responsibilities:

  • Lead incident response and resolution: Proactively troubleshoot debugand resolve complex system-level incidents and outages encompassingLinux operating systems applications and database technologies.
  • Conduct deep-dive root cause analysis: Perform thorough post-incident analysis to identify underlying issues in production environments implementing sustainable solutions.
  • Design and implement robust monitoring: Develop maintain andenhance comprehensive system and database monitoring alerting andobservability solutions (e.g. Grafana Prometheus PMM).
  • Drive automation and efficiency: Automate Linux system administrationtasks operational runbooks and database maintenance to improvesystem reliability consistency and operational efficiency.
  • Collaborate on resilient deployments: Partner with development andengineering teams to ensure seamless reliable and secure softwaredeployments and infrastructure changes.
  • Architect scalable infrastructure: Contribute to the architectural designand implementation of highly scalable resilient and performantinfrastructure solutions.
  • Enhance on-call effectiveness: Participate in and continuously improveon-call rotations developing tools and processes to reduce alert fatigueand minimize human error.
  • Foster technical growth: Mentor and guide junior Site ReliabilityEngineers (SREs) promoting knowledge sharing and skill developmentwithin the team.

Qualifications:

  • Extensive Linux Expertise: Proven experience in advanced Linux systems administration including deep understanding of file systems kernel tuning (Sysctl) and performance optimization.
  • Advanced Troubleshooting & Debugging: Exceptional ability to debugand rapidly resolve complex distributed system-level issues inhigh-pressure production environments.
  • Configuration Management: Hands-on experience with industry-standardconfiguration management tools (e.g. SaltStack Ansible Puppet).
  • Load Balancing & Proxying: Practical experience with load balancing technologies (e.g. Nginx HAProxy LVS) and their configuration for highavailability.
  • Containerization & Orchestration: Strong understanding and practicalexperience with containerization (e.g. Docker) and container orchestrationplatforms (e.g. Kubernetes Mesosphere).
  • Monitoring & Alerting Tooling: Proficiency in implementing maintainingand leveraging system and database monitoring platforms (e.g. GrafanaPrometheus PMM) and custom scripting for alerts.
  • Automation & Scripting Mastery: Highly proficient in developingautomation solutions using scripting languages (e.g. Python Shellscripting Go) for operational tasks.
  • Networking Fundamentals: Solid understanding of core networkingconcepts and protocols (e.g. TCP/IP DNS DHCP BGP IPTables IP &Routing protocols).
  • Database Administration Fundamentals: Strong grasp of relationaldatabase concepts and practical experience with database administrationprinciples.

Preferred Qualifications:

  • Cloud Infrastructure Experience: Experience managing and troubleshooting private/on-premise cloud environments with a focus on identifying and mitigating hardware-related issues and their impact.
  • Relational Database Specialization: Deep practical experience withMariaDB Percona Server and/or MySQL encompassing advanceddatabase administration performance tuning and complex replicationtopologies.
  • Backup & Recovery Expertise: Hands-on experience with robust backupand restore technologies including ZFS.
  • Message Queuing Systems: Familiarity with message queuing systemslike RabbitMQ (RMQ).

PhonePe Full Time Employee Benefits (Not applicable for Intern or Contract Roles)

  • Insurance Benefits - Medical Insurance Critical Illness Insurance Accidental Insurance Life Insurance
  • Wellness Program - Employee Assistance Program Onsite Medical Center Emergency Support System
  • Parental Support - Maternity Benefit Paternity Benefit Program Adoption Assistance Program Day-care Support Program
  • Mobility Benefits - Relocation benefits Transfer Support Policy Travel Policy
  • Retirement Benefits - Employee PF Contribution Flexible PF Contribution Gratuity NPS Leave Encashment
  • Other Benefits - Higher Education Assistance Car Lease Salary Advance Policy

Our inclusive culture promotes individual expression creativity innovation and achievement and in turn helps us better understand and serve our customers. We see ourselves as a place for intellectual curiosity ideas and debates where diverse perspectives lead to deeper understanding and better quality results. PhonePe is an equal opportunity employer and is committed to treating all its employees and job applicants equally; regardless of gender sexual preference religion race color or disability. If you have a disability or special need that requires assistance or reasonable accommodation during the application and hiring process including support for the interview or onboarding process please fill out this form.

Read more about PhonePe on our blog.

Life at PhonePe

PhonePe in the news

Employment Type

Full Time

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.