Site Reliability Engineer (SRE)

Plano, TX - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Position: Site Reliability Engineer (SRE)

Location: Richmond VA or Plano TX

Work Model: Hybrid 3 days onsite per week

Duration: Long term contract

Job Summary:

We are seeking an experienced Site Reliability Engineer (SRE) to support cloud-native platforms and production systems for a large enterprise environment. This role will focus on ensuring high availability reliability performance and scalability of mission-critical applications running on AWS.

Strong Preference: Former Capital One engineers. Candidates must be able to provide verifiable Capital One credentials and be eligible for rehire.

Key Responsibilities:

Design build and maintain highly reliable scalable and resilient systems in AWS
Monitor system health performance and availability using SRE best practices
Implement automation to reduce manual operational work
Troubleshoot production incidents and perform root cause analysis (RCA)
Develop and maintain scripts and tools to improve system reliability and efficiency
Partner with application development platform and infrastructure teams
Support on-call rotations and incident response as required
Enforce operational excellence security and compliance standards

Required Skills & Qualifications:

Former Capital One experience HIGHLY preferred
Must provide credentials for rehire eligibility verification
Strong hands-on experience with AWS (EC2 EKS Lambda CloudWatch IAM etc.)
Python scripting experience strongly preferred
Bash or Shell scripting experience will also be considered
Experience with Linux-based systems and troubleshooting
Understanding of SRE concepts: SLIs SLOs error budgets monitoring and alerting
Experience supporting production environments at scale

Preferred Qualifications:

Experience with CI/CD pipelines
Infrastructure as Code (Terraform CloudFormation)
Containerization and orchestration (Docker Kubernetes)
Observability tools (Prometheus Grafana Datadog CloudWatch)
Experience working in highly regulated enterprise environments

Position: Site Reliability Engineer (SRE) Location: Richmond VA or Plano TX Work Model: Hybrid 3 days onsite per week Duration: Long term contract Job Summary: We are seeking an experienced Site Reliability Engineer (SRE) to support cloud-native platforms and production systems for a large e...