Customer currently uses ELK stack and the goal is to standardize and modernize logs metrics and traces using OpenTelemetry while improving visibility reliability and operational intelligence.
Observability Architecture & Modernization
Assess the existing ELK-based observability setup and define a modern observability architecture
Design and implement standardized logging metrics and distributed tracing using OpenTelemetry
Define observability best practices for cloud-native and Azure-based applications
Ensure consistent telemetry collection across microservices APIs and infrastructure
Logging Metrics & Tracing
Instrument applications using OpenTelemetry SDKs ( Python Javascript as applicable)
Support Kubernetes and container-based workloads (if applicable)
Configure and optimize log pipelines trace exporters and metric collectors
Integrate OpenTelemetry with ELK / OpenSearch / Azure Monitor / other backends
Define SLIs SLOs and alerting strategies
Knowldege in integrating the GitHub and Jira metrics as DORA metrics to observability.
Operational Excellence
Improve observability performance cost efficiency and data retention strategies
Create dashboards runbooks and documentation
AI-based Anomaly Detection & Triage (Good to Have )
Design or integrate AI/ML-based anomaly detection for logs metrics and traces
Worked on AIOps capabilities for automated incident triage and insights
Required Technical Skills
Core Observability
Strong hands-on experience with ELK Stack (Elasticsearch Logstash Kibana)
Deep understanding of logs metrics traces and distributed systems
Practical experience with OpenTelemetry (Collectors SDKs exporters receivers)
Cloud & Platforms
Strong experience with Microsoft Azure to integrate with Observability platform.
Experience with Kubernetes / AKS to integrate with Observability platform.
Knowledge of Azure monitoring tools (Azure Monitor Log Analytics Application Insights)
Experience with Kubernetes / AKS is a strong plus.
Soft Skills;
Strong architecture and problem-solving skills
Clear communication and documentation skills
Hands-on mindset with an architect-level view
Good to Have / Preferred Skills
Experience with AIOps / anomaly detection platforms
Exposure to tools like Prometheus Grafana Jaeger OpenSearch Datadog Dynatrace New Relic (any)
Experience with incident management SRE practices and reliability engineering
Qualifications :
Soft Skills;
Strong architecture and problem-solving skills
Clear communication and documentation skills
Hands-on mindset with an architect-level view
Good to Have / Preferred Skills
Experience with AIOps / anomaly detection platforms
Exposure to tools like Prometheus Grafana Jaeger OpenSearch Datadog Dynatrace New Relic (any)
Experience with incident management SRE practices and reliability engineering
Additional Information :
Experience Level: 5-8 Years
Location: Chennai
Remote Work :
No
Employment Type :
Full-time
About CRUXCrux Consulting Services is a growing organization in the consulting industry, focused on delivering high-quality services to our clients. We are looking for a motivated and enthusiastic HR - IT Recruiter Trainee to join our HR team and support our IT recruitment efforts.