Job Overview
We are seeking a Site Reliability Engineer (SRE) to support large-scale distributed and fault-tolerant systems for a global technology environment. This role combines software engineering and systems operations to improve system reliability scalability automation and performance.
What Will You Do:
- Design build and maintain scalable and highly available infrastructure systems.
- Develop automation tools and scripts to improve operational efficiency.
- Monitor system performance and troubleshoot infrastructure issues proactively.
- Implement monitoring alerting SLIs SLOs and SLA tracking.
- Participate in 24/7 on-call rotations and incident response activities.
- Conduct root cause analysis and support post-mortem reviews.
- Collaborate with engineering and cross-functional teams on system improvements.
- Ensure infrastructure security compliance and reliability best practices.
- Support containerized environments using Docker and Kubernetes.
What Makes You A Good Fit:
- Bachelors or Masters Degree in Computer Science IT Engineering or related field.
- Minimum 3 years of experience in SRE Systems Engineering or Software Engineering.
- Proficient in programming languages such as Python Go Java or C.
- Strong Linux systems and networking knowledge.
- Experience with Docker Kubernetes Prometheus and Grafana is preferred.
- Knowledge of relational databases and system architecture.
- Strong analytical troubleshooting and communication skills.
What We Offer:
- Opportunity to work on large-scale global infrastructure systems.
- Exposure to advanced cloud automation and reliability engineering practices.
- Career growth within a dynamic technology environment.
- Collaborative and fast-paced team culture.
Job Overview We are seeking a Site Reliability Engineer (SRE) to support large-scale distributed and fault-tolerant systems for a global technology environment. This role combines software engineering and systems operations to improve system reliability scalability automation and performance. What W...
Job Overview
We are seeking a Site Reliability Engineer (SRE) to support large-scale distributed and fault-tolerant systems for a global technology environment. This role combines software engineering and systems operations to improve system reliability scalability automation and performance.
What Will You Do:
- Design build and maintain scalable and highly available infrastructure systems.
- Develop automation tools and scripts to improve operational efficiency.
- Monitor system performance and troubleshoot infrastructure issues proactively.
- Implement monitoring alerting SLIs SLOs and SLA tracking.
- Participate in 24/7 on-call rotations and incident response activities.
- Conduct root cause analysis and support post-mortem reviews.
- Collaborate with engineering and cross-functional teams on system improvements.
- Ensure infrastructure security compliance and reliability best practices.
- Support containerized environments using Docker and Kubernetes.
What Makes You A Good Fit:
- Bachelors or Masters Degree in Computer Science IT Engineering or related field.
- Minimum 3 years of experience in SRE Systems Engineering or Software Engineering.
- Proficient in programming languages such as Python Go Java or C.
- Strong Linux systems and networking knowledge.
- Experience with Docker Kubernetes Prometheus and Grafana is preferred.
- Knowledge of relational databases and system architecture.
- Strong analytical troubleshooting and communication skills.
What We Offer:
- Opportunity to work on large-scale global infrastructure systems.
- Exposure to advanced cloud automation and reliability engineering practices.
- Career growth within a dynamic technology environment.
- Collaborative and fast-paced team culture.
View more
View less