Site Reliability Engineer (Azure)

Apptad Inc

Not Interested
Bookmark
Report This Job

profile Job Location:

Richmond, VA - USA

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

Job Description:


We are seeking a Senior Site Reliability Engineer (SRE) with strong expertise in Azure Cloud Architecture to design implement and optimize scalable secure and cost-effective cloud and reliability solutions. The ideal candidate will combine architectural skills with hands-on operational excellence in monitoring automation and performance optimization across Azure environments.

Key Responsibilities:

  • Define and manage Service Level Objectives (SLOs) Service Level Indicators (SLIs) and Service Level Agreements (SLAs) to ensure reliability and availability targets.
  • Evaluate existing systems and implement mechanisms for proactive monitoring alerting and incident response.
  • Review and enhance NFR (Non-Functional Requirements) processes ensuring robust coverage of parameters such as performance scalability and reliability.
  • Design and develop reliable and automated Azure cloud architectures aligned with business and operational goals.
  • Implement CloudOps and SRE best practices including process standardization automation and TOIL reduction.
  • Collaborate with cross-functional teams to integrate Azure services effectively while improving observability and system performance.
  • Support FinOps initiatives to ensure cost optimization and efficient resource utilization.
  • Drive initiatives for self-service enablement noise reduction and operational resilience.
  • Mentor junior engineers and promote a culture of reliability ownership and continuous improvement.

Technical Skills (Mandatory):

  • Azure Cloud Architecture Azure App Services Azure Functions Azure Monitor Azure Logic Apps Azure DevOps Azure SQL Azure Front Door Azure Service Bus.
  • Infrastructure as Code (IaC): Terraform Ansible.
  • Automation & CI/CD: Jenkins GitLab PowerShell Shell scripting.
  • Application Stack: .NET Framework C# Java Spring Boot Angular JavaScript Entity Framework (EF/EF Core).
  • Containers & Orchestration: Docker Kubernetes.
  • Database: PostgreSQL.
  • Architecture Patterns: Microservices Application Architecture Application Re-architecting Architectural Diagrams & Documentation.

Preferred Qualifications:

  • Experience implementing SRE principles such as error budgets incident management postmortems and observability practices.
  • Strong understanding of cloud security performance tuning and disaster recovery strategies.
  • Familiarity with monitoring tools (e.g. Azure Monitor Application Insights Prometheus Grafana).
  • Excellent problem-solving and cross-functional collaboration skills.
Job Description: We are seeking a Senior Site Reliability Engineer (SRE) with strong expertise in Azure Cloud Architecture to design implement and optimize scalable secure and cost-effective cloud and reliability solutions. The ideal candidate will combine architectural skills with hands-on operati...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting