Site Reliability Engineer SRE

Washington, AR - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Randstad is seeking a Site Reliability Engineer for a high-impact role with a premier client based in Washington DC. In this position you will bridge the gap between development and operations by applying a software engineering mindset to system administration and infrastructure. You will be responsible for ensuring the scalability performance and high availability of cloud-based services across AWS and Azure environments. By leveraging Infrastructure-as-Code advanced observability with Dynatrace and SRE principles like error budgets and SLOs you will drive operational excellence and lead incident response efforts for mission-critical applications.

Key Responsibilities

Deployment & Automation: Architect and manage CI/CD pipelines (GitHub Actions AWS CodePipeline) and automate global infrastructure using Terraform CloudFormation or CDK.
Performance & Capacity: Drive cost-optimization initiatives manage auto-scaling thresholds and execute resiliency/performance testing to ensure system durability.
Incident Management: Act as a primary on-call responder using ITIL frameworks and ServiceNow; develop Root Cause Analysis (RCA) documentation and maintain knowledge bases.
Observability & Monitoring: Implement distributed tracing and optimize monitoring via Dynatrace and Kibana to create advanced dashboards and anomaly detection.
Reliability Engineering: Define and monitor SLIs and SLOs while managing error budgets to balance feature velocity with system stability.
Security & Compliance: Oversee service accounts manage digital certificates and execute rapid remediation for security incidents.

Qualifications

Education: Bachelors degree in Computer Science Engineering or a related technical field.
Experience: 2 to 4 years of professional experience in SRE DevOps or Infrastructure roles.
Cloud Proficiency: Practical hands-on experience with both AWS and Azure platforms.
Technical Skills: Mid-level proficiency in Python (or similar scripting languages) and configuration management tools like Ansible.
Containerization: Solid understanding of Docker and orchestration via Kubernetes or ECS.
Infrastructure Fundamentals: Strong knowledge of Linux systems networking protocols and both Relational/NoSQL database architectures.
Soft Skills: Excellent written and verbal communication skills with the ability to manage competing priorities independently.
Flexibility: Ability to participate in a production on-call rotation including work outside standard business hours.

Required Skills :

Basic Qualification :

Additional Skills :

This is a high PRIORITY requisition. This is a PROACTIVE requisition

Background Check : No

Drug Screen : No

Randstad is seeking a Site Reliability Engineer for a high-impact role with a premier client based in Washington DC. In this position you will bridge the gap between development and operations by applying a software engineering mindset to system administration and infrastructure. You will be respons...