Job Description:
-
Design implement and maintain monitoring alerting and observability solutions using Datadog and Splunk
-
Ensure high availability scalability and reliability of production systems on AWS
-
Manage and support containerized applications using Docker and Kubernetes
-
Automate infrastructure provisioning and configuration using Terraform and Ansible
-
Participate in incident response root cause analysis (RCA) and post-mortems
-
Optimize system performance cost and reliability through proactive monitoring
-
Collaborate with development teams to improve system resilience and deployment practices
-
Implement SRE best practices including SLIs SLOs SLAs and error budgets
-
Support CI/CD pipelines and deployment automation
Job Description: Design implement and maintain monitoring alerting and observability solutions using Datadog and Splunk Ensure high availability scalability and reliability of production systems on AWS Manage and support containerized applications using Docker and Kubernetes Autom...
Job Description:
-
Design implement and maintain monitoring alerting and observability solutions using Datadog and Splunk
-
Ensure high availability scalability and reliability of production systems on AWS
-
Manage and support containerized applications using Docker and Kubernetes
-
Automate infrastructure provisioning and configuration using Terraform and Ansible
-
Participate in incident response root cause analysis (RCA) and post-mortems
-
Optimize system performance cost and reliability through proactive monitoring
-
Collaborate with development teams to improve system resilience and deployment practices
-
Implement SRE best practices including SLIs SLOs SLAs and error budgets
-
Support CI/CD pipelines and deployment automation
View more
View less