Position: Site Reliability Engineer (SRE)
Location: Richmond VA or Plano TX
Work Model: Hybrid 3 days onsite per week
Duration: Long term contract
Job Summary:
We are seeking an experienced Site Reliability Engineer (SRE) to support cloud-native platforms and production systems for a large enterprise environment. This role will focus on ensuring high availability reliability performance and scalability of mission-critical applications running on AWS.
Strong Preference: Former Capital One engineers. Candidates must be able to provide verifiable Capital One credentials and be eligible for rehire.
Key Responsibilities:
- Design build and maintain highly reliable scalable and resilient systems in AWS
- Monitor system health performance and availability using SRE best practices
- Implement automation to reduce manual operational work
- Troubleshoot production incidents and perform root cause analysis (RCA)
- Develop and maintain scripts and tools to improve system reliability and efficiency
- Partner with application development platform and infrastructure teams
- Support on-call rotations and incident response as required
- Enforce operational excellence security and compliance standards
Required Skills & Qualifications:
- Former Capital One experience HIGHLY preferred
- Must provide credentials for rehire eligibility verification
- Strong hands-on experience with AWS (EC2 EKS Lambda CloudWatch IAM etc.)
- Python scripting experience strongly preferred
- Bash or Shell scripting experience will also be considered
- Experience with Linux-based systems and troubleshooting
- Understanding of SRE concepts: SLIs SLOs error budgets monitoring and alerting
- Experience supporting production environments at scale
Preferred Qualifications:
- Experience with CI/CD pipelines
- Infrastructure as Code (Terraform CloudFormation)
- Containerization and orchestration (Docker Kubernetes)
- Observability tools (Prometheus Grafana Datadog CloudWatch)
- Experience working in highly regulated enterprise environments
Position: Site Reliability Engineer (SRE) Location: Richmond VA or Plano TX Work Model: Hybrid 3 days onsite per week Duration: Long term contract Job Summary: We are seeking an experienced Site Reliability Engineer (SRE) to support cloud-native platforms and production systems for a large e...
Position: Site Reliability Engineer (SRE)
Location: Richmond VA or Plano TX
Work Model: Hybrid 3 days onsite per week
Duration: Long term contract
Job Summary:
We are seeking an experienced Site Reliability Engineer (SRE) to support cloud-native platforms and production systems for a large enterprise environment. This role will focus on ensuring high availability reliability performance and scalability of mission-critical applications running on AWS.
Strong Preference: Former Capital One engineers. Candidates must be able to provide verifiable Capital One credentials and be eligible for rehire.
Key Responsibilities:
- Design build and maintain highly reliable scalable and resilient systems in AWS
- Monitor system health performance and availability using SRE best practices
- Implement automation to reduce manual operational work
- Troubleshoot production incidents and perform root cause analysis (RCA)
- Develop and maintain scripts and tools to improve system reliability and efficiency
- Partner with application development platform and infrastructure teams
- Support on-call rotations and incident response as required
- Enforce operational excellence security and compliance standards
Required Skills & Qualifications:
- Former Capital One experience HIGHLY preferred
- Must provide credentials for rehire eligibility verification
- Strong hands-on experience with AWS (EC2 EKS Lambda CloudWatch IAM etc.)
- Python scripting experience strongly preferred
- Bash or Shell scripting experience will also be considered
- Experience with Linux-based systems and troubleshooting
- Understanding of SRE concepts: SLIs SLOs error budgets monitoring and alerting
- Experience supporting production environments at scale
Preferred Qualifications:
- Experience with CI/CD pipelines
- Infrastructure as Code (Terraform CloudFormation)
- Containerization and orchestration (Docker Kubernetes)
- Observability tools (Prometheus Grafana Datadog CloudWatch)
- Experience working in highly regulated enterprise environments
View more
View less