Role Summary
We are seeking a highly skilled DevOps / Site Reliability Engineer (SRE) to design build and operate scalable reliable and secure cloud platforms on AWS and Azure. The ideal candidate will have strong experience in Dynatrace monitoring and observability configurations CI/CD automation infrastructure reliability and production support. This role focuses on improving system availability performance and operational excellence across complex distributed systems.
Key Responsibilities
DevOps & Cloud Engineering
-
Design implement and maintain cloud infrastructure on AWS and Azure using Infrastructure as Code (Terraform ARM Bicep CloudFormation).
-
Build and manage CI/CD pipelines using tools such as Jenkins Azure DevOps GitHub Actions or GitLab CI.
-
Automate provisioning configuration management and deployment processes.
-
Support containerized and microservices-based architectures using Docker and Kubernetes (EKS / AKS).
Site Reliability Engineering (SRE)
-
Define and enforce SLIs SLOs and SLAs to ensure service reliability.
-
Lead incident response root cause analysis (RCA) and post-incident reviews.
-
Implement proactive reliability practices capacity planning and performance optimization.
-
Reduce operational toil through automation and self-healing mechanisms.
Monitoring & Observability (Dynatrace)
-
Configure and manage Dynatrace for full-stack observability across applications infrastructure and cloud services.
-
Implement:
-
OneAgent deployments (VMs containers Kubernetes)
-
Custom dashboards alerts and anomaly detection
-
Service flow distributed tracing and RUM
-
Integrate Dynatrace with CI/CD pipelines ITSM tools (ServiceNow) and alerting systems.
-
Tune alerts to minimize noise and improve actionable insights.
Security & Compliance
-
Implement cloud security best practices including IAM secrets management and encryption.
-
Integrate security and compliance checks into CI/CD pipelines.
-
Collaborate with security teams on vulnerability remediation and audits.
Collaboration & Operations
-
Work closely with development QA security and platform teams.
-
Provide on-call support for production systems as part of an SRE rotation.
-
Document operational runbooks standards and best practices.
Required Skills & Qualifications
Technical Skills
-
Strong hands-on experience with AWS and/or Azure (compute networking storage monitoring).
-
Proven experience in Dynatrace configuration and administration.
-
Expertise in Linux/Unix environments.
-
Hands-on scripting experience (Bash Python PowerShell).
-
Experience with Kubernetes Docker and microservices architectures.
-
Strong knowledge of CI/CD tools and Git-based version control.
-
Experience with logging and monitoring tools (Dynatrace Prometheus Grafana ELK).
DevOps & SRE Practices
-
Solid understanding of SRE principles reliability engineering and high availability design.
-
Experience with incident management RCA and performance tuning.
-
Familiarity with infrastructure automation and configuration management tools.
Preferred Qualifications
-
Cloud certifications (AWS Azure or Kubernetes).
-
Experience integrating Dynatrace with cloud-native services and Kubernetes.
-
Knowledge of service mesh API gateways or event-driven architectures.
-
Exposure to FinOps cost optimization or multi-cloud strategies.
Key Competencies
-
Strong troubleshooting and problem-solving skills.
-
Ability to work in high-availability production-critical environments.
-
Excellent communication and stakeholder collaboration skills.
-
Continuous improvement mindset with a focus on automation and reliability.
Role Summary We are seeking a highly skilled DevOps / Site Reliability Engineer (SRE) to design build and operate scalable reliable and secure cloud platforms on AWS and Azure. The ideal candidate will have strong experience in Dynatrace monitoring and observability configurations CI/CD automation i...
Role Summary
We are seeking a highly skilled DevOps / Site Reliability Engineer (SRE) to design build and operate scalable reliable and secure cloud platforms on AWS and Azure. The ideal candidate will have strong experience in Dynatrace monitoring and observability configurations CI/CD automation infrastructure reliability and production support. This role focuses on improving system availability performance and operational excellence across complex distributed systems.
Key Responsibilities
DevOps & Cloud Engineering
-
Design implement and maintain cloud infrastructure on AWS and Azure using Infrastructure as Code (Terraform ARM Bicep CloudFormation).
-
Build and manage CI/CD pipelines using tools such as Jenkins Azure DevOps GitHub Actions or GitLab CI.
-
Automate provisioning configuration management and deployment processes.
-
Support containerized and microservices-based architectures using Docker and Kubernetes (EKS / AKS).
Site Reliability Engineering (SRE)
-
Define and enforce SLIs SLOs and SLAs to ensure service reliability.
-
Lead incident response root cause analysis (RCA) and post-incident reviews.
-
Implement proactive reliability practices capacity planning and performance optimization.
-
Reduce operational toil through automation and self-healing mechanisms.
Monitoring & Observability (Dynatrace)
-
Configure and manage Dynatrace for full-stack observability across applications infrastructure and cloud services.
-
Implement:
-
OneAgent deployments (VMs containers Kubernetes)
-
Custom dashboards alerts and anomaly detection
-
Service flow distributed tracing and RUM
-
Integrate Dynatrace with CI/CD pipelines ITSM tools (ServiceNow) and alerting systems.
-
Tune alerts to minimize noise and improve actionable insights.
Security & Compliance
-
Implement cloud security best practices including IAM secrets management and encryption.
-
Integrate security and compliance checks into CI/CD pipelines.
-
Collaborate with security teams on vulnerability remediation and audits.
Collaboration & Operations
-
Work closely with development QA security and platform teams.
-
Provide on-call support for production systems as part of an SRE rotation.
-
Document operational runbooks standards and best practices.
Required Skills & Qualifications
Technical Skills
-
Strong hands-on experience with AWS and/or Azure (compute networking storage monitoring).
-
Proven experience in Dynatrace configuration and administration.
-
Expertise in Linux/Unix environments.
-
Hands-on scripting experience (Bash Python PowerShell).
-
Experience with Kubernetes Docker and microservices architectures.
-
Strong knowledge of CI/CD tools and Git-based version control.
-
Experience with logging and monitoring tools (Dynatrace Prometheus Grafana ELK).
DevOps & SRE Practices
-
Solid understanding of SRE principles reliability engineering and high availability design.
-
Experience with incident management RCA and performance tuning.
-
Familiarity with infrastructure automation and configuration management tools.
Preferred Qualifications
-
Cloud certifications (AWS Azure or Kubernetes).
-
Experience integrating Dynatrace with cloud-native services and Kubernetes.
-
Knowledge of service mesh API gateways or event-driven architectures.
-
Exposure to FinOps cost optimization or multi-cloud strategies.
Key Competencies
-
Strong troubleshooting and problem-solving skills.
-
Ability to work in high-availability production-critical environments.
-
Excellent communication and stakeholder collaboration skills.
-
Continuous improvement mindset with a focus on automation and reliability.
View more
View less