Site Reliability Engineer (SRE)

Not Interested
Bookmark
Report This Job

profile Job Location:

Denver, CO - USA

profile Monthly Salary: Not Disclosed
Posted on: 7 hours ago
Vacancies: 1 Vacancy

Job Summary

Job Description:

We are seeking a highly experienced Site Reliability Engineer (SRE) with a strong Java development background to lead reliability initiatives and ensure the stability scalability and performance of mission-critical systems. This role blends deep hands-on engineering with leadership ownership and a proactive approach to reliability and operations.

The ideal candidate is someone who has evolved from a strong developer into an SRE/DevOps leader understands production systems deeply and can partner effectively with development platform and operations teams.

Key Responsibilities:

  • Design build and maintain highly reliable scalable and fault-tolerant systems in production environments.
  • Embed reliability best practices (SLOs SLIs error budgets) into the software development lifecycle.
  • Work closely with development teams on Java Spring Boot microservices to improve operability and resilience.
  • Automate operational workflows to reduce manual effort and improve system efficiency.
  • Monitor system health performance and availability; proactively identify risks and bottlenecks.
  • Lead incident management on-call support and root cause analysis for production issues.
  • Drive continuous improvement initiatives focused on availability scalability and performance.
  • Support and oversee release and deployment activities including after-hours support when required.
  • Champion best practices around CI/CD infrastructure as code and cloud-native operations.
  • Mentor engineers and provide technical leadership across SRE and development teams.
  • Collaborate with stakeholders to align reliability goals with business priorities.

Required Qualifications

  • 12 years of IT experience in SRE DevOps or Production Engineering
  • Strong Java development experience (Java 17 Spring Boot Microservices Spring Web)
  • Hands-on experience with OpenShift (OCP) Kubernetes and Docker
  • Strong expertise in MongoDB (data modeling design optimization)
  • Experience with Apache Kafka and event-driven architectures
  • Working knowledge of Oracle Database
  • Familiarity with BDD practices
  • Solid experience with CI/CD automation and IaC (Terraform Ansible)
  • Exposure to AI-assisted development tools (e.g. GitHub Copilot)
  • Excellent troubleshooting skills in high-pressure production environments
  • Strong communication collaboration and ownership mindset

Preferred Qualifications:

  • Experience with monitoring and observability tools such as Prometheus Grafana and the ELK stack.
  • Knowledge of security best practices compliance standards and production hardening.
  • Prior experience leading or mentoring SRE teams or guiding engineers in reliability practices.
Job Description: We are seeking a highly experienced Site Reliability Engineer (SRE) with a strong Java development background to lead reliability initiatives and ensure the stability scalability and performance of mission-critical systems. This role blends deep hands-on engineering with leadersh...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting