The Site Reliability Engineer (SRE) will provide L2/L3 support for AWS cloud infrastructure and production environments ensuring high availability reliability and operational efficiency. This role focuses on automating operational tasks monitoring systems and collaborating with DevOps Development and Infrastructure teams to resolve issues and improve service performance.
Responsibilities:
- Provide L2/L3 support for AWS cloud infrastructure and production environments.
- Implement and maintain automation for operational tasks deployments and monitoring.
- Monitor system health troubleshoot incidents and ensure high availability of services.
- Develop and enhance scripts/tools to reduce manual effort and improve efficiency.
- Work closely with DevOps Development and Infrastructure teams for issue resolution.
- Participate in on-call rotations and incident management during US shift hours.
- Maintain and improve monitoring alerting and logging systems.
- Ensure adherence to SRE best practices for reliability scalability and performance.
- Document runbooks SOPs and knowledge base articles.
Qualifications :
- Strong hands-on experience with AWS services (EC2 S3 RDS Lambda VPC IAM CloudWatch).
- Experience in automation and scripting using Python Shell or PowerShell.
- Familiarity with Infrastructure as Code tools (Terraform or CloudFormation).
- Understanding of CI/CD pipelines and DevOps practices.
- Experience with monitoring tools like CloudWatch Grafana Prometheus or ELK.
- Good understanding of Linux systems and networking concepts.
- Exposure to containerization (Docker/Kubernetes).
- Ability to troubleshoot production issues under pressure.
- Excellent verbal and written communication skills.
- Willingness to work in the US time zone shift.
Remote Work :
Yes
Employment Type :
Full-time
The Site Reliability Engineer (SRE) will provide L2/L3 support for AWS cloud infrastructure and production environments ensuring high availability reliability and operational efficiency. This role focuses on automating operational tasks monitoring systems and collaborating with DevOps Development an...
The Site Reliability Engineer (SRE) will provide L2/L3 support for AWS cloud infrastructure and production environments ensuring high availability reliability and operational efficiency. This role focuses on automating operational tasks monitoring systems and collaborating with DevOps Development and Infrastructure teams to resolve issues and improve service performance.
Responsibilities:
- Provide L2/L3 support for AWS cloud infrastructure and production environments.
- Implement and maintain automation for operational tasks deployments and monitoring.
- Monitor system health troubleshoot incidents and ensure high availability of services.
- Develop and enhance scripts/tools to reduce manual effort and improve efficiency.
- Work closely with DevOps Development and Infrastructure teams for issue resolution.
- Participate in on-call rotations and incident management during US shift hours.
- Maintain and improve monitoring alerting and logging systems.
- Ensure adherence to SRE best practices for reliability scalability and performance.
- Document runbooks SOPs and knowledge base articles.
Qualifications :
- Strong hands-on experience with AWS services (EC2 S3 RDS Lambda VPC IAM CloudWatch).
- Experience in automation and scripting using Python Shell or PowerShell.
- Familiarity with Infrastructure as Code tools (Terraform or CloudFormation).
- Understanding of CI/CD pipelines and DevOps practices.
- Experience with monitoring tools like CloudWatch Grafana Prometheus or ELK.
- Good understanding of Linux systems and networking concepts.
- Exposure to containerization (Docker/Kubernetes).
- Ability to troubleshoot production issues under pressure.
- Excellent verbal and written communication skills.
- Willingness to work in the US time zone shift.
Remote Work :
Yes
Employment Type :
Full-time
View more
View less