Site Reliability Engineer (SRE)

Denver, CO - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Job Description:

We are seeking a highly experienced Site Reliability Engineer (SRE) with a strong Java development background to lead reliability initiatives and ensure the stability scalability and performance of mission-critical systems. This role blends deep hands-on engineering with leadership ownership and a proactive approach to reliability and operations.

The ideal candidate is someone who has evolved from a strong developer into an SRE/DevOps leader understands production systems deeply and can partner effectively with development platform and operations teams.

Key Responsibilities:

Design build and maintain highly reliable scalable and fault-tolerant systems in production environments.
Embed reliability best practices (SLOs SLIs error budgets) into the software development lifecycle.
Work closely with development teams on Java Spring Boot microservices to improve operability and resilience.
Automate operational workflows to reduce manual effort and improve system efficiency.
Monitor system health performance and availability; proactively identify risks and bottlenecks.
Lead incident management on-call support and root cause analysis for production issues.
Drive continuous improvement initiatives focused on availability scalability and performance.
Support and oversee release and deployment activities including after-hours support when required.
Champion best practices around CI/CD infrastructure as code and cloud-native operations.
Mentor engineers and provide technical leadership across SRE and development teams.
Collaborate with stakeholders to align reliability goals with business priorities.

Required Qualifications

12 years of IT experience in SRE DevOps or Production Engineering
Strong Java development experience (Java 17 Spring Boot Microservices Spring Web)
Hands-on experience with OpenShift (OCP) Kubernetes and Docker
Strong expertise in MongoDB (data modeling design optimization)
Experience with Apache Kafka and event-driven architectures
Working knowledge of Oracle Database
Familiarity with BDD practices
Solid experience with CI/CD automation and IaC (Terraform Ansible)
Exposure to AI-assisted development tools (e.g. GitHub Copilot)
Excellent troubleshooting skills in high-pressure production environments
Strong communication collaboration and ownership mindset

Preferred Qualifications:

Experience with monitoring and observability tools such as Prometheus Grafana and the ELK stack.
Knowledge of security best practices compliance standards and production hardening.
Prior experience leading or mentoring SRE teams or guiding engineers in reliability practices.

Job Description: We are seeking a highly experienced Site Reliability Engineer (SRE) with a strong Java development background to lead reliability initiatives and ensure the stability scalability and performance of mission-critical systems. This role blends deep hands-on engineering with leadersh...

Job Description:

Key Responsibilities:

Design build and maintain highly reliable scalable and fault-tolerant systems in production environments.
Embed reliability best practices (SLOs SLIs error budgets) into the software development lifecycle.
Work closely with development teams on Java Spring Boot microservices to improve operability and resilience.
Automate operational workflows to reduce manual effort and improve system efficiency.
Monitor system health performance and availability; proactively identify risks and bottlenecks.
Lead incident management on-call support and root cause analysis for production issues.
Drive continuous improvement initiatives focused on availability scalability and performance.
Support and oversee release and deployment activities including after-hours support when required.
Champion best practices around CI/CD infrastructure as code and cloud-native operations.
Mentor engineers and provide technical leadership across SRE and development teams.
Collaborate with stakeholders to align reliability goals with business priorities.

Required Qualifications

12 years of IT experience in SRE DevOps or Production Engineering
Strong Java development experience (Java 17 Spring Boot Microservices Spring Web)
Hands-on experience with OpenShift (OCP) Kubernetes and Docker
Strong expertise in MongoDB (data modeling design optimization)
Experience with Apache Kafka and event-driven architectures
Working knowledge of Oracle Database
Familiarity with BDD practices
Solid experience with CI/CD automation and IaC (Terraform Ansible)
Exposure to AI-assisted development tools (e.g. GitHub Copilot)
Excellent troubleshooting skills in high-pressure production environments
Strong communication collaboration and ownership mindset

Preferred Qualifications:

Experience with monitoring and observability tools such as Prometheus Grafana and the ELK stack.
Knowledge of security best practices compliance standards and production hardening.
Prior experience leading or mentoring SRE teams or guiding engineers in reliability practices.