Job Summary
We are seeking an experienced Observability Specialist to design implement and optimize enterprise observability solutions across complex distributed environments. The ideal candidate will have deep hands-on expertise in Splunk AppDynamics and Dynatrace with a strong understanding of modern cloud-native architectures DevOps practices and performance engineering.
This role is responsible for ensuring end-to-end visibility across applications infrastructure and user experience enabling proactive monitoring faster incident resolution and improved system reliability.
Key Responsibilities
Observability Architecture & Strategy
- Design and implement enterprise observability frameworks and standards.
- Develop monitoring strategies aligned with SRE and DevOps best practices.
- Define SLIs SLOs and KPIs to measure application and system health.
- Establish logging metrics tracing and alerting standards.
Platform Implementation & Administration
- Deploy configure and manage:
- Splunk Enterprise / Splunk Cloud
- AppDynamics APM
- Dynatrace platform
- Integrate observability tools with CI/CD pipelines and ITSM platforms.
- Develop dashboards alerts and custom monitoring solutions.
- Manage data onboarding parsing and normalization in Splunk.
- Configure business transaction monitoring and application performance baselines.
Performance & Reliability Engineering
- Perform root cause analysis using logs metrics and traces.
- Optimize application and infrastructure performance.
- Support incident response troubleshooting and post-mortems.
- Implement synthetic and real-user monitoring strategies.
Automation & Integration
- Automate monitoring configuration using Infrastructure as Code (Terraform Ansible).
- Develop APIs and scripting (Python Bash) for automation and integration.
- Integrate observability platforms with Kubernetes Docker and cloud providers (AWS Azure GCP).
Governance & Best Practices
- Establish observability governance data retention and cost optimization strategies.
- Provide guidance to development and operations teams on instrumenting applications.
- Conduct training sessions and documentation for internal stakeholders.
Required Qualifications
- 5 years of experience in monitoring observability or SRE roles.
- 3 years of hands-on experience with:
- Splunk (Search Processing Language - SPL)
- AppDynamics (Controller Agents Business Transactions)
- Dynatrace (OneAgent Smartscape AI engine)
- Strong understanding of:
- Microservices architecture
- Kubernetes and containerized environments
- Cloud platforms (AWS Azure or GCP)
- Linux/Unix systems
- Experience with logging distributed tracing and metrics collection.
- Strong scripting skills (Python Bash or PowerShell).
- Experience with CI/CD pipelines (Jenkins GitLab CI etc.).
Preferred Qualifications
- Splunk Certified Architect / Power User
- AppDynamics or Dynatrace certifications
- Experience with OpenTelemetry
- Knowledge of ITIL processes
- Experience in regulated industries (Finance Healthcare Telecom)
Key Competencies
- Analytical thinking and strong troubleshooting skills
- Ability to work in high-availability production environments
- Strong communication and stakeholder engagement skills
- Proactive mindset with a focus on automation and continuous improvement
What Success Looks Like
- Reduced MTTR through proactive monitoring
- Improved system uptime and performance visibility
- Clear actionable dashboards for business and technical stakeholders
- Standardized observability practices across teams
Job Summary We are seeking an experienced Observability Specialist to design implement and optimize enterprise observability solutions across complex distributed environments. The ideal candidate will have deep hands-on expertise in Splunk AppDynamics and Dynatrace with a strong understanding of mod...
Job Summary
We are seeking an experienced Observability Specialist to design implement and optimize enterprise observability solutions across complex distributed environments. The ideal candidate will have deep hands-on expertise in Splunk AppDynamics and Dynatrace with a strong understanding of modern cloud-native architectures DevOps practices and performance engineering.
This role is responsible for ensuring end-to-end visibility across applications infrastructure and user experience enabling proactive monitoring faster incident resolution and improved system reliability.
Key Responsibilities
Observability Architecture & Strategy
- Design and implement enterprise observability frameworks and standards.
- Develop monitoring strategies aligned with SRE and DevOps best practices.
- Define SLIs SLOs and KPIs to measure application and system health.
- Establish logging metrics tracing and alerting standards.
Platform Implementation & Administration
- Deploy configure and manage:
- Splunk Enterprise / Splunk Cloud
- AppDynamics APM
- Dynatrace platform
- Integrate observability tools with CI/CD pipelines and ITSM platforms.
- Develop dashboards alerts and custom monitoring solutions.
- Manage data onboarding parsing and normalization in Splunk.
- Configure business transaction monitoring and application performance baselines.
Performance & Reliability Engineering
- Perform root cause analysis using logs metrics and traces.
- Optimize application and infrastructure performance.
- Support incident response troubleshooting and post-mortems.
- Implement synthetic and real-user monitoring strategies.
Automation & Integration
- Automate monitoring configuration using Infrastructure as Code (Terraform Ansible).
- Develop APIs and scripting (Python Bash) for automation and integration.
- Integrate observability platforms with Kubernetes Docker and cloud providers (AWS Azure GCP).
Governance & Best Practices
- Establish observability governance data retention and cost optimization strategies.
- Provide guidance to development and operations teams on instrumenting applications.
- Conduct training sessions and documentation for internal stakeholders.
Required Qualifications
- 5 years of experience in monitoring observability or SRE roles.
- 3 years of hands-on experience with:
- Splunk (Search Processing Language - SPL)
- AppDynamics (Controller Agents Business Transactions)
- Dynatrace (OneAgent Smartscape AI engine)
- Strong understanding of:
- Microservices architecture
- Kubernetes and containerized environments
- Cloud platforms (AWS Azure or GCP)
- Linux/Unix systems
- Experience with logging distributed tracing and metrics collection.
- Strong scripting skills (Python Bash or PowerShell).
- Experience with CI/CD pipelines (Jenkins GitLab CI etc.).
Preferred Qualifications
- Splunk Certified Architect / Power User
- AppDynamics or Dynatrace certifications
- Experience with OpenTelemetry
- Knowledge of ITIL processes
- Experience in regulated industries (Finance Healthcare Telecom)
Key Competencies
- Analytical thinking and strong troubleshooting skills
- Ability to work in high-availability production environments
- Strong communication and stakeholder engagement skills
- Proactive mindset with a focus on automation and continuous improvement
What Success Looks Like
- Reduced MTTR through proactive monitoring
- Improved system uptime and performance visibility
- Clear actionable dashboards for business and technical stakeholders
- Standardized observability practices across teams
View more
View less