Randstad is seeking a Site Reliability Engineer for a high-impact role with a premier client based in Washington DC. In this position you will bridge the gap between development and operations by applying a software engineering mindset to system administration and infrastructure. You will be responsible for ensuring the scalability performance and high availability of cloud-based services across AWS and Azure environments. By leveraging Infrastructure-as-Code advanced observability with Dynatrace and SRE principles like error budgets and SLOs you will drive operational excellence and lead incident response efforts for mission-critical applications.
Key Responsibilities
- Deployment & Automation: Architect and manage CI/CD pipelines (GitHub Actions AWS CodePipeline) and automate global infrastructure using Terraform CloudFormation or CDK.
- Performance & Capacity: Drive cost-optimization initiatives manage auto-scaling thresholds and execute resiliency/performance testing to ensure system durability.
- Incident Management: Act as a primary on-call responder using ITIL frameworks and ServiceNow; develop Root Cause Analysis (RCA) documentation and maintain knowledge bases.
- Observability & Monitoring: Implement distributed tracing and optimize monitoring via Dynatrace and Kibana to create advanced dashboards and anomaly detection.
- Reliability Engineering: Define and monitor SLIs and SLOs while managing error budgets to balance feature velocity with system stability.
- Security & Compliance: Oversee service accounts manage digital certificates and execute rapid remediation for security incidents.
Qualifications
- Education: Bachelors degree in Computer Science Engineering or a related technical field.
- Experience: 2 to 4 years of professional experience in SRE DevOps or Infrastructure roles.
- Cloud Proficiency: Practical hands-on experience with both AWS and Azure platforms.
- Technical Skills: Mid-level proficiency in Python (or similar scripting languages) and configuration management tools like Ansible.
- Containerization: Solid understanding of Docker and orchestration via Kubernetes or ECS.
- Infrastructure Fundamentals: Strong knowledge of Linux systems networking protocols and both Relational/NoSQL database architectures.
- Soft Skills: Excellent written and verbal communication skills with the ability to manage competing priorities independently.
- Flexibility: Ability to participate in a production on-call rotation including work outside standard business hours.
Required Skills :
Basic Qualification :
Additional Skills :
This is a high PRIORITY requisition. This is a PROACTIVE requisition
Background Check : No
Drug Screen : No
Randstad is seeking a Site Reliability Engineer for a high-impact role with a premier client based in Washington DC. In this position you will bridge the gap between development and operations by applying a software engineering mindset to system administration and infrastructure. You will be respons...
Randstad is seeking a Site Reliability Engineer for a high-impact role with a premier client based in Washington DC. In this position you will bridge the gap between development and operations by applying a software engineering mindset to system administration and infrastructure. You will be responsible for ensuring the scalability performance and high availability of cloud-based services across AWS and Azure environments. By leveraging Infrastructure-as-Code advanced observability with Dynatrace and SRE principles like error budgets and SLOs you will drive operational excellence and lead incident response efforts for mission-critical applications.
Key Responsibilities
- Deployment & Automation: Architect and manage CI/CD pipelines (GitHub Actions AWS CodePipeline) and automate global infrastructure using Terraform CloudFormation or CDK.
- Performance & Capacity: Drive cost-optimization initiatives manage auto-scaling thresholds and execute resiliency/performance testing to ensure system durability.
- Incident Management: Act as a primary on-call responder using ITIL frameworks and ServiceNow; develop Root Cause Analysis (RCA) documentation and maintain knowledge bases.
- Observability & Monitoring: Implement distributed tracing and optimize monitoring via Dynatrace and Kibana to create advanced dashboards and anomaly detection.
- Reliability Engineering: Define and monitor SLIs and SLOs while managing error budgets to balance feature velocity with system stability.
- Security & Compliance: Oversee service accounts manage digital certificates and execute rapid remediation for security incidents.
Qualifications
- Education: Bachelors degree in Computer Science Engineering or a related technical field.
- Experience: 2 to 4 years of professional experience in SRE DevOps or Infrastructure roles.
- Cloud Proficiency: Practical hands-on experience with both AWS and Azure platforms.
- Technical Skills: Mid-level proficiency in Python (or similar scripting languages) and configuration management tools like Ansible.
- Containerization: Solid understanding of Docker and orchestration via Kubernetes or ECS.
- Infrastructure Fundamentals: Strong knowledge of Linux systems networking protocols and both Relational/NoSQL database architectures.
- Soft Skills: Excellent written and verbal communication skills with the ability to manage competing priorities independently.
- Flexibility: Ability to participate in a production on-call rotation including work outside standard business hours.
Required Skills :
Basic Qualification :
Additional Skills :
This is a high PRIORITY requisition. This is a PROACTIVE requisition
Background Check : No
Drug Screen : No
View more
View less