Site Reliability Engineer (SRE)

Five9

Not Interested
Bookmark
Report This Job

profile Job Location:

Bengaluru - India

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

Join us in bringing joy to customer experience. Five9 is a leading provider of cloud contact center software bringing the power of cloud innovation to customers worldwide.

Living our values everyday results in our team-first culture and enables us to innovate grow and thrive while enjoying the journey together. We celebrate diversity and foster an inclusive environment empowering our employees to be their authentic selves.

We are seeking a Site Reliability Engineer (SRE) to join our team and help build and maintain highly reliable scalable systems. This role combines software engineering and operations expertise to ensure our services meet and exceed ambitious reliability targets while enabling rapid development and deployment.

This position requires approximately 50% software development and 50% operational work focusing on automation monitoring and system reliability rather than manual operations. The team works collaboratively with our platform application and database teams to provide a reliable and available service.

Key Responsibilities

Observability & Monitoring

  • Dashboards & Metrics: Design and implement comprehensive dashboards covering OS/platform-level and application-level monitoring broken into primary (RED) and secondary indicators (USE).

  • Availability & Reliability: Establish and maintain SLIs SLOs and error budgets for the service.

  • Performance Monitoring: Build alerting systems and performance monitoring to proactively identify and resolve issues before they impact users.

  • Incident Response: Participate in on-call rotations lead incident response efforts (including post-mortem analysis and remediation) maintain on-call routing and assign application-level problems to engineering teams.

Infrastructure Automation & Deployment

  • CI/CD Pipeline Management: Build and optimize CI/CD pipelines for speed and resilience.

  • Infrastructure as Code: Develop and maintain infrastructure using tools like Terraform Ansible or similar.

  • Configuration Management: Automate system configuration and ensure consistency across environments. Implement and recommend best practices for configuration control.

Security & Compliance

  • Security Automation: Ensure security scanning systems are in place and review escalated vulnerabilities.

  • Access Control: Maintain proper authentication authorization and audit logging systems.

  • Compliance Reporting: Ensure systems meet regulatory and industry standards.

  • Security Incident Response: Participate in security incident response and remediation efforts.

Cost Optimization

  • Resource Management: Monitor and optimize cloud resource usage and costs.

  • Capacity Planning: Analyze usage patterns and plan for future capacity needs.

  • Cost Analysis: Provide recommendations for cost-effective architecture and resource allocation.

  • Right-sizing: Implement automated scaling and resource optimization strategies.

Common Services & Platform Engineering

  • Shared Infrastructure: Build and maintain common services (notification systems caching layers message queues or third-party stacks).

  • Database Operations: Manage database reliability performance and scaling (where not handled by DB teams).

  • Service Mesh & Networking: Implement and maintain service discovery load balancing and network policies.

  • Developer Tools: Create and maintain tools and platforms that improve developer productivity and reliability.

Required Qualifications

Technical Skills

  • Programming Languages: Proficiency in at least two of Python Shell Java NodeJS or similar.

  • Cloud Platforms: Experience with AWS GCP or Azure.

  • Containerization: Hands-on experience with Docker Kubernetes and container orchestration.

  • Monitoring & Observability: Experience with Prometheus Grafana ELK stack or similar tools.

  • Infrastructure as Code: Proficiency with Ansible Terraform Helm or similar.

  • Version Control: Expert-level Git usage and collaborative development practices.

  • CI/CD Pipelines: Hands-on experience with GitLab CI/CD GitHub Actions or similar.

SRE-Specific Knowledge

  • Experience defining and maintaining SLOs and SLIs.

  • Understanding and implementation of error budget policies.

  • Proven track record in toil reduction and automation.

  • Experience with capacity planning and performance testing.

Preferred Qualifications

  • Bachelors degree in Computer Science Engineering or equivalent experience.

  • Experience with microservices and distributed systems.

  • Knowledge of security best practices and compliance frameworks.

  • Experience with chaos engineering and reliability testing.

  • Prior experience in an SRE or DevOps role at a tech company.

  • Contributions to open-source projects or technical communities.

Success Metrics

  • Maintain or improve service availability and reliability metrics.

  • Demonstrated reduction in manual operational work through automation.

  • Effective participation in incident response and prevention.

  • High-quality well-tested code contributions.

  • Strong collaboration with development teams to improve system reliability.

Team Culture & Values

  • Blameless Post-Mortems: Learn from failures without blame.

  • Automation First: Prefer automated solutions over manual processes.

  • Measure Everything: Data-driven decisions and continuous improvement.

  • Knowledge Sharing: Document and share expertise.

  • Work-Life Balance: Sustainable on-call practices and reasonable load.

Growth Opportunities

  • Work on cutting-edge infrastructure and reliability challenges.

  • Exposure to large-scale distributed systems and modern cloud technologies.

  • Clear career path toward Senior SRE Staff Engineer or Management roles.

  • Collaboration with engineering teams across the organization.

Five9 embraces diversity and is committed to building a team that represents a variety of backgrounds perspectives and skills. The more inclusive we are the better we are. Five9 is an equal opportunity employer.

View our privacy policy including our privacy notice to California residents here: Five9 will never request that an applicant send money as a prerequisite for commencing employment with Five9.

Join us in bringing joy to customer experience. Five9 is a leading provider of cloud contact center software bringing the power of cloud innovation to customers worldwide. Living our values everyday results in our team-first culture and enables us to innovate grow and thrive while enjoying the journ...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting

About Company

Company Logo

Five9 delivers the most reliable cloud contact center that empowers organizations to deliver extraordinary customer experiences. Request a demo!

View Profile View Profile