Sr. Site Reliability Engineer

Not Interested
Bookmark
Report This Job

profile Job Location:

Iselin, NJ - USA

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

Key Responsibilities:

  • 12 years of experience.
  • Design and develop enterprise-grade APIs and configuration solutions.
  • Contribute to enterprise and application architecture design.
  • Lead observability initiatives including monitoring alerting and incident response.
  • Build and maintain dashboards and alerting systems using Grafana Prometheus Splunk etc.
  • Create and maintain detailed runbooks for operational procedures and incident handling.
  • Define and monitor SLAs SLOs and KPIs for critical services.
  • Collaborate with architecture development and security teams to ensure system reliability.
  • Evaluate and adopt new technologies to improve system performance and maintainability.

Required Skills:

  • Strong background in IT infrastructure cloud platforms (AWS Azure GCP) and SRE practices.
  • Experience in enterprise and application architecture.
  • Proven experience in building APIs and backend services.

Hands-on experience with tools:

  • Monitoring & Observability: Grafana Prometheus Splunk
  • ITSM & Operations: ServiceNow OpsRamp
  • Project & Incident Tracking: JIRA
  • Experience in building alerts dashboards and operational runbooks.
  • Experience managing distributed systems and large-scale production environments.
  • Strong leadership communication and problem-solving skills.
  • Ability to quickly learn and adapt to new technologies and environments.

Preferred:

  • Exposure to OpenShift and Azure cloud platforms.
  • Certifications: SRE Foundation ITIL or relevant cloud certifications.
Key Responsibilities: 12 years of experience. Design and develop enterprise-grade APIs and configuration solutions. Contribute to enterprise and application architecture design. Lead observability initiatives including monitoring alerting and incident response. Build and maintain dashboards and ale...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting