Site Reliability Engineer (SRE)

Hyderabad - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

The job posting is outdated and position may be filled

Job Summary

Role Title: SRE

Mandatory Skills:

SRE AWS or GCP
Observability
Incident Management
New Relic Grafana Prometheus
Terraform
Python

Role Description / Skills:

Cloud Platforms:

5 years of proficiency in AWS or GCP and modern pipelining technologies and approaches.

Containerization and Orchestration:

3 years of design deployment and monitoring of containerization technologies like Docker and container orchestration tools such as Kubernetes.

Systems / Infrastructure as Code (IaC):

3 years of hands-on experience with IaC tools such as Terraform or CloudFormation.

Monitoring and Logging:

4 years of expertise in implementing and managing observability platforms and monitoring tools (New Relic Grafana Prometheus) feeding into SLOs/SLI objectives and logging solutions like ELK (Elasticsearch Logstash Kibana) or Splunk.

Automation:

3 years of hands-on experience with scripting languages such as Python or Bash and configuration management tools like Salt Ansible or Chef.

CI/CD:

3 years of hands-on experience with CI/CD pipelines like Jenkins.

Reliability and Performance:

5 years of designing and implementing highly reliable scalable and available systems with system optimization performance and resource utilization.

Incident Response:

3 years of primary incident management on-call support with incident response procedures and tools such as PagerDuty and related best practices.

Collaboration and Communication:

You possess a knack for fostering professional growth and knowledge-sharing with proven ability contributing to a collaborative and skill-enhancing work environment.

Documentation:

Proficient in creating and maintaining clear and comprehensive documentation.

Problem-Solving:

You strive to understand the problem you are trying to solve before deciding on the solution and you are thoughtful and methodical in its implementation vs. jumping to the next tool.
Ability to troubleshoot complex issues in distributed systems.

Role Title: SRE Mandatory Skills: SRE AWS or GCP Observability Incident Management New Relic Grafana Prometheus Terraform Python Role Description / Skills: Cloud Platforms: 5 years of proficiency in AWS or GCP and modern pipelining technologies and approaches. Containerization and ...