drjobs Site Reliability Engineer (7+ Years Exp) - System

Site Reliability Engineer (7+ Years Exp) - System

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Bengaluru - India

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

About PhonePe Group:

PhonePe is Indias leading digital payments company with 50 crore (500 Million) registered users and 3.7 crore (37 Million) merchants covering over 99% of the postal codes across India. On the back of its leadership in digital payments PhonePe has expanded into financial services (Insurance Mutual Funds Stock Broking and Lending) as well as adjacent tech-enabled businesses such as Pincode for hyperlocal shopping and Indus App Store which is Indias first localized App Store. The PhonePe Group is a portfolio of businesses aligned with the companys vision to offer every Indian an equal opportunity to accelerate their progress by unlocking the flow of money and access to services.

Culture

At PhonePe we take extra care to make sure you give your best at work Everyday! And creating the right environment for you is just one of the things we do. We empower people and trust them to do the right thing. Here you own your work from start to finish right from day one. Being enthusiastic about tech is a big part of being at PhonePe. If you like building technology that impacts millions ideating with some of the best minds in the country and executing on your dreams with purpose and speed join us!

Site Reliability Engineer - System

Expeience: 7 Years

Summary

We are seeking a skilled and proactive Site Reliability Engineer (SRE) to join ourteam. The ideal candidate will have extensive experience in Linux systemsadministration understanding of database management and a proven trackrecord of troubleshooting complex system-level issues. You will be responsiblefor ensuring the reliability performance and scalability of our productionenvironments balancing system and database stability through robustmonitoring debugging and automation practices.

Responsibilities:

  • Lead incident response and resolution: Proactively troubleshoot debugand resolve complex system-level incidents and outages encompassingLinux operating systems applications and database technologies.
  • Conduct deep-dive root cause analysis: Perform thorough post-incident analysis to identify underlying issues in production environments implementing sustainable solutions.
  • Design and implement robust monitoring: Develop maintain andenhance comprehensive system and database monitoring alerting andobservability solutions (e.g. Grafana Prometheus PMM).
  • Drive automation and efficiency: Automate Linux system administrationtasks operational runbooks and database maintenance to improvesystem reliability consistency and operational efficiency.
  • Collaborate on resilient deployments: Partner with development andengineering teams to ensure seamless reliable and secure softwaredeployments and infrastructure changes.
  • Architect scalable infrastructure: Contribute to the architectural designand implementation of highly scalable resilient and performantinfrastructure solutions.
  • Enhance on-call effectiveness: Participate in and continuously improveon-call rotations developing tools and processes to reduce alert fatigueand minimize human error.
  • Foster technical growth: Mentor and guide junior Site ReliabilityEngineers (SREs) promoting knowledge sharing and skill developmentwithin the team.

Qualifications:

  • Extensive Linux Expertise: Proven experience in advanced Linux systems administration including deep understanding of file systems kernel tuning (Sysctl) and performance optimization.
  • Advanced Troubleshooting & Debugging: Exceptional ability to debugand rapidly resolve complex distributed system-level issues inhigh-pressure production environments.
  • Configuration Management: Hands-on experience with industry-standardconfiguration management tools (e.g. SaltStack Ansible Puppet).
  • Load Balancing & Proxying: Practical experience with load balancing technologies (e.g. Nginx HAProxy LVS) and their configuration for highavailability.
  • Containerization & Orchestration: Strong understanding and practicalexperience with containerization (e.g. Docker) and container orchestrationplatforms (e.g. Kubernetes Mesosphere).
  • Monitoring & Alerting Tooling: Proficiency in implementing maintainingand leveraging system and database monitoring platforms (e.g. GrafanaPrometheus PMM) and custom scripting for alerts.
  • Automation & Scripting Mastery: Highly proficient in developingautomation solutions using scripting languages (e.g. Python Shellscripting Go) for operational tasks.
  • Networking Fundamentals: Solid understanding of core networkingconcepts and protocols (e.g. TCP/IP DNS DHCP BGP IPTables IP &Routing protocols).
  • Database Administration Fundamentals: Strong grasp of relationaldatabase concepts and practical experience with database administrationprinciples.

Preferred Qualifications:

  • Cloud Infrastructure Experience: Experience managing and troubleshooting private/on-premise cloud environments with a focus on identifying and mitigating hardware-related issues and their impact.
  • Relational Database Specialization: Deep practical experience withMariaDB Percona Server and/or MySQL encompassing advanceddatabase administration performance tuning and complex replicationtopologies.
  • Backup & Recovery Expertise: Hands-on experience with robust backupand restore technologies including ZFS.
  • Message Queuing Systems: Familiarity with message queuing systemslike RabbitMQ (RMQ).

PhonePe Full Time Employee Benefits (Not applicable for Intern or Contract Roles)

  • Insurance Benefits - Medical Insurance Critical Illness Insurance Accidental Insurance Life Insurance
  • Wellness Program - Employee Assistance Program Onsite Medical Center Emergency Support System
  • Parental Support - Maternity Benefit Paternity Benefit Program Adoption Assistance Program Day-care Support Program
  • Mobility Benefits - Relocation benefits Transfer Support Policy Travel Policy
  • Retirement Benefits - Employee PF Contribution Flexible PF Contribution Gratuity NPS Leave Encashment
  • Other Benefits - Higher Education Assistance Car Lease Salary Advance Policy

Working at PhonePe is a rewarding experience! Great people a work environment that thrives on creativity the opportunity to take on roles beyond a defined job description are just some of the reasons you should work with us. Read more about PhonePe on our blog.

Life at PhonePe

PhonePe in the news

Employment Type

Full Time

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.