Site Reliability Engineer (SRE)

IT America Inc

Not Interested
Bookmark
Report This Job

profile Job Location:

Plano, TX - USA

profile Monthly Salary: Not Disclosed
Posted on: 4 days ago
Vacancies: 1 Vacancy

Job Summary

Position: Site Reliability Engineer (SRE)

Location: Richmond VA or Plano TX

Work Model: Hybrid 3 days onsite per week

Duration: Long term contract

Job Summary:

We are seeking an experienced Site Reliability Engineer (SRE) to support cloud-native platforms and production systems for a large enterprise environment. This role will focus on ensuring high availability reliability performance and scalability of mission-critical applications running on AWS.

Strong Preference: Former Capital One engineers. Candidates must be able to provide verifiable Capital One credentials and be eligible for rehire.

Key Responsibilities:

  • Design build and maintain highly reliable scalable and resilient systems in AWS
  • Monitor system health performance and availability using SRE best practices
  • Implement automation to reduce manual operational work
  • Troubleshoot production incidents and perform root cause analysis (RCA)
  • Develop and maintain scripts and tools to improve system reliability and efficiency
  • Partner with application development platform and infrastructure teams
  • Support on-call rotations and incident response as required
  • Enforce operational excellence security and compliance standards

Required Skills & Qualifications:

  • Former Capital One experience HIGHLY preferred
  • Must provide credentials for rehire eligibility verification
  • Strong hands-on experience with AWS (EC2 EKS Lambda CloudWatch IAM etc.)
  • Python scripting experience strongly preferred
  • Bash or Shell scripting experience will also be considered
  • Experience with Linux-based systems and troubleshooting
  • Understanding of SRE concepts: SLIs SLOs error budgets monitoring and alerting
  • Experience supporting production environments at scale

Preferred Qualifications:

  • Experience with CI/CD pipelines
  • Infrastructure as Code (Terraform CloudFormation)
  • Containerization and orchestration (Docker Kubernetes)
  • Observability tools (Prometheus Grafana Datadog CloudWatch)
  • Experience working in highly regulated enterprise environments

Position: Site Reliability Engineer (SRE) Location: Richmond VA or Plano TX Work Model: Hybrid 3 days onsite per week Duration: Long term contract Job Summary: We are seeking an experienced Site Reliability Engineer (SRE) to support cloud-native platforms and production systems for a large e...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting