Job Title: Site Reliability Engineer (SRE) GCP Infrastructure
Experience: 5 years
Location: Bangalore (Work from Office)
Shift Timing: 2:00 PM - 10:00 PM
About the Role:
We re on the lookout for an experienced and motivated Site Reliability Engineer (SRE) to join our team in Bangalore. As an SRE you ll be responsible for ensuring the reliability and performance of our cloud infrastructure on Google Cloud Platform (GCP). If you have hands-on experience with Terraform and Kubernetes to manage and automate cloud infrastructure this could be the perfect opportunity for you!
Key Responsibilities:
- Infrastructure Management: Leverage your expertise in GCP to design implement and manage scalable and highly available infrastructure.
- Automation & Infrastructure as Code (IaC): Use Terraform to automate the provisioning and management of infrastructure components ensuring consistency and efficiency.
- Kubernetes Operations: Manage and optimize Kubernetes clusters for containerized applications ensuring they are resilient and performant.
- Monitoring & Reliability: Develop and implement proactive monitoring solutions to track infrastructure health detect issues and minimize downtime.
- Collaboration: Work closely with development teams to improve system reliability and performance while scaling for growth.
- Incident Management: Participate in incident resolution ensuring that root causes are identified and prevented in future occurrences.
- Continuous Improvement: Contribute to the development of best practices automation frameworks and strategies for improving the reliability of cloud-native systems.
Required Skills & Qualifications:
- Experience: 5 years of hands-on experience in a Site Reliability Engineering or similar role with a focus on GCP infrastructure.
- Cloud Infrastructure: Strong experience managing GCP services including networking storage and compute resources.
- Terraform: Extensive experience with Terraform for automating cloud infrastructure provisioning.
- Kubernetes: Proficiency with Kubernetes for container orchestration including deployment scaling and management of applications.
- Scripting & Automation: Experience in scripting languages (e.g. Python Shell) to automate tasks and improve system reliability.
- Monitoring Tools: Familiarity with monitoring and logging tools (e.g. Prometheus Grafana) to ensure systems run smoothly.
- Problem Solving: Strong troubleshooting skills with the ability to resolve complex infrastructure issues.
- Shift Timing: Comfortable working in the 2:00 PM - 10:00 PM shift.
scripting,monitoring tools,reliability,gcp,problem solving,terraform,kubernetes