Site Reliability Engineer

Fortive

Job Location:

Bengaluru - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Department:

Engineering

Job Summary

Description

Job Summary

As a Site Reliability Engineer you will solve complex problems in a fast-paced collaborative environment supporting transformational projects in cloud technologies and automation for a leading SaaS product in the healthcare industry. You will drive reliability scalability and performance across applications microservices and infrastructure partnering with cross-functional teams to ensure a highly available resilient and efficient platform.

Key Responsibilities Monitoring & Alerting

Develop and maintain comprehensive monitoring for system health applications microservices dependencies and infrastructure.
Establish baseline metrics and set up customized alerts for deviations collaborating with development teams to define key indicators and thresholds.
Manage alert routing to appropriate audiences to prevent alert fatigue.
Monitor and debug Microsoft Azure resources and assist support teams with technical troubleshooting.

Incident Management & Response

Oversee incident management processes including triage severity assessment and coordination of war rooms with relevant stakeholders (DevOps development QA customer service).
Maintain on-call rotation to monitor analyze and resolve critical infrastructure issues and incidents including emergency response.
Document incidents and perform root cause analysis (RCA) ensuring thorough follow-up and continuous improvement.
Lead postmortem reviews and drive action items to completion.

Proactive Reliability & Disaster Recovery

Design and execute tabletop exercises and disaster recovery simulations (e.g. service/region interruptions failover testing traffic management) to validate high availability and resiliency.
Document outcomes and collaborate with DevOps and development teams to implement infrastructure and application improvements.
Create standard procedures for incident scenarios where fixes cannot be immediately implemented.

Continuous Feedback & Reporting

Develop dashboards and reports to monitor system performance error rates resource consumption latency and other key indicators.
Establish consistent reporting schedules for SLA system utilization and customer metrics.
Provide feedback to development and DevOps teams to drive improvements in application and infrastructure performance.

DevOps Pipeline & Automation

Analyze and optimize DevOps pipelines for efficient and reliable deployment operations.
Drive automation initiatives to improve operational efficiency and reduce manual intervention.
Experience with progressive rollout strategies and canary deployments.

Collaboration & Communication

Work closely with cross-functional teams across the organization including QA development DevOps and customer service.
Clearly articulate technical concepts to non-technical colleagues and stakeholders.
Foster a culture of accountability respect excellence and customer service.
Contribute to documentation and knowledge sharing across teams.
Advocate for reliability and operational excellence in architecture and design discussions.

Capacity Planning & Scalability

Participate in capacity planning and scalability assessments to ensure systems can meet future growth and demand.

Required Qualifications:

Education & Experience Guidelines

Bachelors Degree in computer science or relevant field
8-10 years of relevant work experience
Experience with Azure DevOps Kubernetes Docker CI/CD Azure and AWS.
Significant experience with database technologies especially Microsoft SQL.
Experience with Infrastructure as Code (IaC) tools (e.g. Terraform ARM templates).
Familiarity with observability platforms (e.g. Prometheus Grafana).
Scripting skills (e.g. PowerShell Python Bash) for automation and tooling.
Security best practices for cloud environments.
Strong communication and collaboration skills.
Excellent self-management and time management abilities.
Creative problem-solving skills.
Willingness to learn new technologies and adapt to a rapidly changing environment.
Technology certifications (e.g. Azure DevOps Engineer) or willingness to obtain.
Occasional travel may be required.

Other Preferred Knowledge Skills Abilities or Certifications:

Security Certifications: HITRUST CSF Practitioner CISSP
AI/ML Integration Awareness: Supporting ML pipelines in DevOps workflows
Policy-as-Code Tools: Open Policy Agent (OPA) HashiCorp Sentinel
Disaster Recovery Planning: High-availability architecture and recovery strategies
Cloud Cost Optimization: Performance tuning and resource efficiency
Multi-Cloud Experience: Supporting hybrid or multi-cloud environments

Fortive 9 Behaviors by Level:

Executing and Contributing

Customer Obsessed: Understands the customers needs through observation questioning and going to Gemba.

Strategic: Uses data to make informed decisions while anticipating future trends and aligning actions with organizational goals.

Innovation for Impact: Proactively explores new perspectives and experiments to solve day-to-day problems.

Inspiring: Understands how their work contributes to the organizations purpose.

Builds Extraordinary Teams: Actively fosters collaboration by contributing positively supporting shared goals helping others succeed and celebrating team achievements together.

Courageous: Shows strength through actionmoves quickly toward goals embraces uncertainty speaks up and perseveres through challenges with confidence and integrity.

Delivers Results: Sets high standardsand consistently delivers by focusing priorities and overcoming obstacles and upholding organizational values.

Adaptable: Applies rigor by working thoroughly and following processes without cutting corners while remaining adaptable.

Lead with FBS: Goes to Gembaobserves real-world processes not just meetings. Embraces FBSby applying its fundamentals to improve work engage in kaizen and continuously grow knowledge and usage.

Required Experience:

DescriptionJob SummaryAs a Site Reliability Engineer you will solve complex problems in a fast-paced collaborative environment supporting transformational projects in cloud technologies and automation for a leading SaaS product in the healthcare industry. You will drive reliability scalability and p...