Job Title: Site Reliability Engineer (SRE)
Experience: 6 to 9 years
Location: chennai
Job Overview:
We are seeking a skilled and proactive Site Reliability Engineer (SRE) to join our growing team. As an SRE you will be responsible for maintaining the reliability availability and performance of our systems. We re looking for someone with solid experience in monitoring scripting and dashboarding to ensure our services run smoothly and efficiently.
Key Responsibilities:
- System Monitoring: Design implement and maintain monitoring systems to track the performance availability and reliability of applications and infrastructure.
- Incident Management: Troubleshoot resolve and document incidents across various systems to ensure minimal downtime.
- Automation & Scripting: Write and maintain scripts to automate routine tasks and improve operational efficiency. Experience with scripting languages like Python Bash or similar is essential.
- Dashboarding: Develop and maintain dashboards to visualize key metrics and system health enabling proactive identification of potential issues.
- Collaboration: Work closely with development teams to design reliable scalable systems that can handle production traffic.
- On-call Support: Participate in on-call rotations to ensure 24/7 support for critical infrastructure and services.
- Capacity Planning: Analyze system capacity and forecast future needs to ensure systems can scale effectively.
Skills & Qualifications:
- Experience: 6-9 years of hands-on experience in an SRE DevOps or a related role.
- Scripting Knowledge: Strong proficiency in scripting languages (e.g. Python Bash or similar).
- Monitoring Tools: In-depth experience with monitoring tools (e.g. Prometheus Grafana Nagios etc..
- Dashboarding: Expertise in creating visualizations and dashboards that make system performance easy to monitor and understand.
- Problem-Solving: Strong analytical skills with a demonstrated ability to troubleshoot and resolve complex issues.
- Communication Skills: Excellent communication skills and the ability to work in a collaborative fast-paced environment.
Preferred Qualifications:
- Familiarity with cloud platforms (AWS GCP Azure).
- Experience with containerization and orchestration tools like Docker and Kubernetes.
- Knowledge of CI/CD pipelines and their implementation in an SRE environment.
- Previous experience with high-traffic systems and the ability to design for scale.
ci/cd pipelines,automation & scripting,scripting languages (python, bash),on-call support,incident management,system monitoring,infrastructure,dashboarding,site reliability engineer (sre),bash,scripting,cloud platforms (aws, gcp, azure),collaboration,capacity planning,monitoring tools (prometheus, grafana, nagios),containerization (docker, kubernetes)