Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailNot Disclosed
Salary Not Disclosed
1 Vacancy
We are seeking a skilled and detail-oriented Observability Engineer to join our remote team. In this role you will be responsible for designing implementing and maintaining observability solutions that ensure high availability performance and reliability of our systems. Your work will empower teams with real-time insights through metrics logs and tracing helping to drive faster incident response and better system understanding.
Key Responsibilities:
Develop and maintain observability platforms (e.g. Prometheus Grafana OpenTelemetry ELK Datadog New Relic)
Design and implement monitoring strategies across distributed systems
Collaborate with DevOps SRE and engineering teams to define SLIs SLOs and dashboards
Create automated alerts and integrations to improve incident detection and resolution
Analyze performance data and logs to identify trends bottlenecks and areas for optimization
Ensure observability tools are reliable secure and scalable
Provide documentation and training to empower teams to use observability tools effectively
Qualifications:
2 years of experience in observability monitoring or site reliability engineering
Strong knowledge of monitoring logging and tracing tools and best practices
Experience with cloud infrastructure (AWS GCP or Azure)
Proficiency in scripting or automation (e.g. Python Bash Terraform)
Excellent problem-solving skills and ability to work independently in a remote setting
Full Time