Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailNot Disclosed
Salary Not Disclosed
1 Vacancy
As a Remote Analytics Observability Engineer you will be responsible for designing implementing and maintaining end-to-end observability solutions that ensure visibility into analytics systems data pipelines and application performance. You will work cross-functionally with engineering data science DevOps and SRE teams to enable proactive monitoring alerting logging and tracing across data infrastructure.
This role plays a critical part in ensuring data reliability system uptime and actionable insights by building tools and dashboards that improve system transparency and performance.
Key Responsibilities:
Architect and implement observability solutions for analytics platforms and data pipelines (ETL/ELT streaming batch)
Integrate monitoring tools (e.g. Prometheus Grafana Datadog New Relic) into analytics environments (Spark Airflow dbt etc.)
Design real-time dashboards and alerts that capture system health job failures data anomalies and latency issues
Analyze telemetry data to identify performance degradation failures or capacity bottlenecks
Enable distributed tracing for data flows using OpenTelemetry Jaeger or similar technologies
Collaborate with Data Engineering and Site Reliability Engineering (SRE) teams to build scalable and fault-tolerant observability stacks
Drive observability best practices and help teams adopt instrumentation standards
Write infrastructure-as-code to deploy monitoring systems (Terraform Helm Kubernetes etc.)
Required Qualifications:
Bachelors degree in Computer Science Data Engineering or a related field
2 years of experience in observability SRE DevOps or DataOps
Deep knowledge of observability tools such as Grafana Prometheus Datadog Splunk New Relic or Honeycomb
Familiarity with monitoring cloud-based data systems (AWS/GCP/Azure) and platforms like Snowflake BigQuery Redshift or Databricks
Proficiency in scripting and automation (e.g. Python Bash)
Experience with infrastructure management and orchestration tools (Kubernetes Terraform Helm)
Strong analytical and debugging skills using telemetry data.
Full Time