Job Title: Site Reliability Engineer SRE
Position Type: Contract 6 Months
Location: Washington DC
Job Description
Client is seeking a Site Reliability Engineer for a high-impact role with a premier client based in Washington DC. In this position you will bridge the gap between development and operations by applying a software engineering mindset to system administration and infrastructure. You will be responsible for ensuring the scalability performance and high availability of cloud-based services across AWS and Azure environments. By leveraging Infrastructure-as-Code advanced observability with Dynatrace and SRE principles like error budgets and SLOs you will drive operational excellence and lead incident response efforts for mission-critical applications.
Key Responsibilities
- Deployment & Automation: Architect and manage CI/CD pipelines (GitHub Actions AWS CodePipeline) and automate global infrastructure using Terraform CloudFormation or CDK.
- Performance & Capacity: Drive cost-optimization initiatives manage auto-scaling thresholds and execute resiliency/performance testing to ensure system durability.
- Incident Management: Act as a primary on-call responder using ITIL frameworks and ServiceNow; develop Root Cause Analysis (RCA) documentation and maintain knowledge bases.
- Observability & Monitoring: Implement distributed tracing and optimize monitoring via Dynatrace and Kibana to create advanced dashboards and anomaly detection.
- Reliability Engineering: Define and monitor SLIs and SLOs while managing error budgets to balance feature velocity with system stability.
- Security & Compliance: Oversee service accounts manage digital certificates and execute rapid remediation for security incidents.
Qualifications
- Education: Bachelors degree in Computer Science Engineering or a related technical field.
- Experience: 6 years of professional experience in SRE DevOps or Infrastructure roles.
- Cloud Proficiency: Practical hands-on experience with both AWS and Azure platforms.
- Technical Skills: Mid-level proficiency in Python (or similar scripting languages) and configuration management tools like Ansible.
- Containerization: Solid understanding of Docker and orchestration via Kubernetes or ECS.
- Infrastructure Fundamentals: Strong knowledge of Linux systems networking protocols and both Relational/NoSQL database architectures.
- Soft Skills: Excellent written and verbal communication skills with the ability to manage competing priorities independently.
- Flexibility: Ability to participate in a production on-call rotation including work outside standard business hours.
Job Title: Site Reliability Engineer SRE Position Type: Contract 6 Months Location: Washington DC Job Description Client is seeking a Site Reliability Engineer for a high-impact role with a premier client based in Washington DC. In this position you will bridge the gap between development a...
Job Title: Site Reliability Engineer SRE
Position Type: Contract 6 Months
Location: Washington DC
Job Description
Client is seeking a Site Reliability Engineer for a high-impact role with a premier client based in Washington DC. In this position you will bridge the gap between development and operations by applying a software engineering mindset to system administration and infrastructure. You will be responsible for ensuring the scalability performance and high availability of cloud-based services across AWS and Azure environments. By leveraging Infrastructure-as-Code advanced observability with Dynatrace and SRE principles like error budgets and SLOs you will drive operational excellence and lead incident response efforts for mission-critical applications.
Key Responsibilities
- Deployment & Automation: Architect and manage CI/CD pipelines (GitHub Actions AWS CodePipeline) and automate global infrastructure using Terraform CloudFormation or CDK.
- Performance & Capacity: Drive cost-optimization initiatives manage auto-scaling thresholds and execute resiliency/performance testing to ensure system durability.
- Incident Management: Act as a primary on-call responder using ITIL frameworks and ServiceNow; develop Root Cause Analysis (RCA) documentation and maintain knowledge bases.
- Observability & Monitoring: Implement distributed tracing and optimize monitoring via Dynatrace and Kibana to create advanced dashboards and anomaly detection.
- Reliability Engineering: Define and monitor SLIs and SLOs while managing error budgets to balance feature velocity with system stability.
- Security & Compliance: Oversee service accounts manage digital certificates and execute rapid remediation for security incidents.
Qualifications
- Education: Bachelors degree in Computer Science Engineering or a related technical field.
- Experience: 6 years of professional experience in SRE DevOps or Infrastructure roles.
- Cloud Proficiency: Practical hands-on experience with both AWS and Azure platforms.
- Technical Skills: Mid-level proficiency in Python (or similar scripting languages) and configuration management tools like Ansible.
- Containerization: Solid understanding of Docker and orchestration via Kubernetes or ECS.
- Infrastructure Fundamentals: Strong knowledge of Linux systems networking protocols and both Relational/NoSQL database architectures.
- Soft Skills: Excellent written and verbal communication skills with the ability to manage competing priorities independently.
- Flexibility: Ability to participate in a production on-call rotation including work outside standard business hours.
View more
View less