Site Reliability Engineer

Axle

Job Location:

Frederick, MD - USA

Monthly Salary: $ 140000 - 155000

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

(ID: 2025-1135)

Axle is a bioscience and information technology company that offers advancements in translational research biomedical informatics and data science applications to research centers and healthcare organizations nationally and abroad. With experts in biomedical science software engineering and program management we focus on developing and applying research tools and techniques to empower decision-making and accelerate research discoveries. We work with some of the top research organizations and facilities in the country including multiple institutes at the National Institutes of Health (NIH).

Benefits We Offer:

100% Medical Dental & Vision Coverage for Employees
Paid Time Off and Paid Holidays
401K match up to 5%
Educational Benefits for Career Growth
Employee Referral Bonus
Flexible Spending Accounts:
- Healthcare (FSA)
- Parking Reimbursement Account (PRK)
- Dependent Care Assistant Program (DCAP)
- Transportation Reimbursement Account (TRN)

The Site Reliability Engineer role centers on modernizing and consolidating a complex multi-cloud environment across AWS Azure and GCP building a scalable secure and observable platform from the ground up using Kubernetes AI/ML infrastructure and zero-trust principles. Youll combine DevOps and SRE practices to support mission-driven scientific and clinical programs emphasizing automation reliability compliance and proactive monitoring while enabling innovation through AI-driven tooling. The team culture is highly collaborative and growth-oriented valuing experimentation continuous learning and cross-functional leadership with opportunities to shape future multi-cloud and platform engineering solutions.

Responsibilities:

Design and implement enterprise-grade monitoring and observability frameworks (metrics logs traces) across distributed systems using enterprise Splunk Grafana and Open-telemetry tools
Establish and manage SLIs SLOs and error budgets to drive reliability improvements
Develop and maintain real-time asset inventory systems across cloud on-prem and hybrid environments
Automate workload onboarding and offboarding processes ensuring standardization and governance
Track system ownership dependencies and lifecycle states for operational transparency
Build proactive detection mechanisms using AIOps and intelligent alerting to minimize incident impact
Design and operate scalable resilient and secure infrastructure platforms across cloud and hybrid environments
Implement automated compliance tracking and enforcement aligned with organizational and regulatory standards (e.g. NIST FISMA FedRAMP)
Embed ITIL processes (incident change problem configuration management) into SRE workflows
Build and maintain automated deployment environments and pipelines that enforce security compliance and operational standards
Develop golden paths and standardized platform templates for consistent workload deployment
Automate provisioning patching configuration management and environment lifecycle
Leverage AI/ML coding assistants and vibe coding practices to rapidly develop automation scripts tools and internal platforms
Integrate AI-driven tooling into DevOps pipelines for code quality security scanning and operational insights
Lead adoption of AI-enhanced SRE practices including intelligent remediation and predictive operations
Champion DevOps and SRE practices including Infrastructure as Code CI/CD observability and reliability engineering
Build developer-friendly platforms (golden paths) that simplify deployments reduce friction and improve velocity
Enable and optimize infrastructure for AI/ML workloads including data pipelines storage systems and inference environments GPU-enabled and high-performance compute workloads
Build and manage containerized and orchestrated platforms (Docker Kubernetes)
Support cloud migration modernization and platform standardization initiatives
Ensure systems meet security compliance backup and disaster recovery requirements
Evangelize and promote best practices in DevOps SRE and platform engineering to developer communities
Stay abreast of new technologies in your areas but not limited to AIOps MLOps cloud computing & deployment site reliability engineering infrastructure automation security best practices data engineering etc.

Requirements:

Must have total of 6 experience DevOps / SRE roles with monitoring and observability tools (Prometheus Grafana ELK or cloud-native equivalents) for on-prem and cloud hosted workloads.
Must have 4 years of Hands-on Linux experience that includes Ubuntu/CentOS/Red Hat operating systems containers dependency management and administration support
Must have 4 years of experience automating Infrastructure-as-Code (IaC) deployments to one of the following cloud platforms Amazon AWS Google GCP and Microsoft Azure
Must have 4 years with CI/CD and automation tools such as Terraform Ansible Chef Puppet Jenkins GitHub Actions
Strong scripting skills (Python Bash PowerShell or similar)
Must be proficient using vibe coding and coding assistants to develop scripts tools and applications for the DevOps and SRE use cases
Must have proficiency to debug or troubleshoot and/or deploying SQL and/or NoSQL databases object storage web servers open-source programming stack for R Core Java is desired but not mandatory
Must be willing to learn new technologies adopt and adapt to emerging technologies or needs from a project to a project
Cloud certifications is preferred
Certifications in Grafana Splunk Docker Kubernetes is preferred but optional

Disclaimer: The above description is meant to illustrate the general nature of work and level of effort being performed by individuals assigned to this position or job description. This is not restricted as a complete list of all skills responsibilities duties and/or assignments required. Individuals may be required to perform duties outside of their position job description or responsibilities as needed.

The diversity of Axles employees is a tremendous asset. We are firmly committed to providing equal opportunity in all aspects of employment and will not tolerate any illegal discrimination or harassment based on age race gender religion national origin disability marital status covered veteran status sexual orientation status with respect to public assistance and other characteristics protected under state federal or local law and to deter those who aid abet or induce discrimination or coerce others to discriminate.

Accessibility: If you need an accommodation as part of the employment process please contact:

This role has a market-competitive salary with an anticipated base compensation range listed below. Actual salaries will vary depending on a candidates experience qualifications skills and location.

Salary Range

$140000 - $155000 USD

Required Experience:

(ID: 2025-1135)Axle is a bioscience and information technology company that offers advancements in translational research biomedical informatics and data science applications to research centers and healthcare organizations nationally and abroad. With experts in biomedical science software engineeri...