Enterprise Observability Specialist

CES Limited

Posted on : 23-06-2025

Employer Active

1 Vacancy

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Send me jobs like this

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Job Location

Dallas - USA

Monthly Salary

Not Disclosed

Salary Not Disclosed

Vacancy

1 Vacancy

Posted on : 23-06-2025

Job Description

The Enterprise Observability Specialist is a mid-level position within the Unified Digital Intelligence function s Enterprise Observability team responsible for maintaining the resilience and performance of critical infrastructure applications and websites. This role involves designing and implementing Level 0 automation and Level 1 monitoring solutions conducting Level 2 advanced troubleshooting coordinating incident response and administering configuring and deploying observability tools primarily Dynatrace along with ThousandEyes Evolven and others as required. The specialist supports the event intelligence platform and collaborates with cross-functional teams to monitor on-premise and cloud environments contributing to the team s 24x7 monitoring operations. Additional responsibilities include maintaining shift operator logs and performing handover duties to ensure seamless team functionality.

Key Roles & Responsibilities

The incumbent directly or through collaboration will:

Provide Level 1 support by monitoring alerts and resolving basic issues across critical websites web applications and infrastructure escalating unresolved issues to Level 2 as needed.
Perform Level 2 advanced diagnostics (e.g. log analysis performance troubleshooting) to resolve complex issues such as misconfigurations and performance bottlenecks.
Design configure and administer observability tools including Dynatrace ThousandEyes Evolven and others to ensure optimal monitoring capabilities.
Implement Level 0 automation processes (e.g. automated ticket creation and routing) to improve alert response efficiency.
Develop observability processes across infrastructure production applications and websites to ensure robust anomaly detection and situational awareness.
Integrate application and infrastructure data with the event intelligence platform and other observability tools to enable automated incident handling.
Lead incident response coordination leveraging observability data and analytics to accelerate recovery and keep stakeholders informed.
Maintain comprehensive shift operator logs documenting incidents tool configurations actions taken and escalations during assigned shifts.
Conduct clear and thorough handovers to the next shift ensuring ongoing issues system status and pending actions are communicated effectively.
Deliver training sessions for monitoring teams on configuration administration and usage of observability tools and the event intelligence platform supporting Level 1 and Level 2 functions.
Maintain thorough documentation of processes configurations and incident responses to foster knowledge sharing and reduce future downtime.
Troubleshoot application and infrastructure performance issues offering performance tuning recommendations when necessary.
Support the implementation of SLO/SLI metrics in collaboration with support teams and application owners.
Participate in shift-based operations including weekends and after-hours support as part of a rotating schedule.
Obtain or maintain a relevant Dynatrace Specialist certification (e.g. Application Performance Monitoring Specialist Infrastructure Monitoring Specialist) within 6 months of hire.

Education Experience & Skill Requirements

Preferred educational background includes a Bachelors or Associates degree in Technology Engineering or a related field.
4 7 years of experience in Information Systems with a focus on application and infrastructure monitoring.
Strong expertise in designing and managing observability tools particularly Dynatrace with familiarity in ThousandEyes Evolven and event intelligence platforms.
Experience in application development technologies programming languages and advanced troubleshooting.
Proficiency across multiple operating systems including Unix Linux and Windows.
Hands-on experience with observability and monitoring tools for servers applications (real and synthetic) and infrastructure (client/server/logs).
Working knowledge of scripting languages such as Perl Java or Python.
Experience with event intelligence platforms for automated incident detection and response.
Strong understanding of cloud-native monitoring strategies and technologies.
Demonstrated ability to script and automate using REST APIs and webhooks.
Practical experience monitoring applications running on Kubernetes and OpenShift clusters including service tracing and health checks.
Solid understanding of cloud infrastructure monitoring tools and environments (e.g. EC2 Lambda Azure Functions GKE EKS).
Strong analytical and troubleshooting skills to perform Level 2 fault isolation and resolution.
Excellent communication skills to support incident coordination and collaboration with cross-functional technical and operational teams.

Measures of Success

Effective design and implementation of automation and observability processes leading to reduced recovery times and improved visibility.
Accurate detailed shift operator logs and smooth handovers supporting uninterrupted team operations.
Coordinated Level 1 and Level 2 incident response with effective integration of observability data into relevant platforms enhancing overall system reliability.
Positive feedback from monitoring teams regarding training sessions and tool adoption along with demonstrated progress toward Dynatrace certification.

Employment Type

Full Time

Company Industry

Key Skills

Apply Now

About Company

CES Limited

Report This Job

Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.

Start Now

Dr.Job AutoApply

3X your job search with AutoApply's AI for faster dream job results.

Enterprise Observability Specialist

CES Limited

Job Description

Employment Type

Company Industry

Key Skills

About Company

Similar Jobs

KOL Specialist

Payroll Specialist

Process & Cleaning Validation Specialist

Client Relations Specialist | Remote

Accountant Specialist / Contabile - Ciclo Attivo

Spanish Tech Product Support Specialist in Porto

Customer Support Specialist - Swedish - Remote in Greece

Junior IT-Support / Servicedesk Support Specialist till Tietoevry