Datadog Administration and Operations (Servicenow)

HP


Job Location:

Bengaluru - India

Monthly Salary: Not Disclosed
Posted on: 4 days ago
Vacancies: 1 Vacancy

Job Summary

Datadog Administration and Operations (Servicenow)

Description -

Were seeking a Datadog administration and operations expert who will be responsible for managing our observability platform to ensure comprehensive monitoring alerting and performance analytics across infrastructure and applications. This role is critical for maintaining system reliability improving incident response and supporting DevOps and engineering teams with actionable insights. This is an exciting opportunity to get in on the ground floor implementing and scaling the tools processes and governance at HP.

Responsibilities

  • Observability Architecture & Ownership

    • Design and implement an enterprisegrade observability strategy spanning Datadog (metrics logs traces/APM synthetics RUM network performance cloud cost) and integrations with ServiceNow.

    • Define monitoring standards tagging conventions dashboards SLOs/SLIs and alerting policies for infra and apps (onprem cloud containers).

  • Datadog Implementation & Scale

    • Deploy and manage Datadog agents integrations (AWS/Azure/GCP Kubernetes NGINX DBs messaging) and service catalog coverage.

    • Build golden dashboards standardized monitors and runbooks for infra components (compute storage network) platforms (Kubernetes) and critical apps.

  • ServiceNow Integration & Event Management

    • Implement and optimize Datadog ServiceNow event routing correlation rules deduplication and Incident/Problem autocreation with enriched context.

    • Maintain CI relationships in ServiceNow CMDB drive discovery mapping and align alerts with CI ownership and support groups.

    • Enable closedloop remediation using IntegrationHub workflows and change controls; contribute to Change Advisory Board (CAB) standards.

  • Reliability Engineering & Operational Excellence

    • Maintain SLOs error budgets and escalation policies. Reduce alert noise; drive actionable tiered alerts.

    • Partner with App Infra SecOps and NOC teams to improve MTTR and postincident reviews with telemetrybacked corrective actions.

  • Automation & IaC

    • Automate provisioning of monitors dashboards synthetics tags and service owner mapping.

    • Build runbooks remediation scripts and service workflows; integrate with CI/CD to promote consistent monitoring across environments.

  • Governance Compliance & Cost Optimization

    • Implement data retention policies access controls RBAC and tagging for chargeback/showback.

    • Optimize Datadog usage (APM sampling log pipelines/archives metric volumes) while protecting critical visibility.

Preferred Education & Experience

  • Bachelors degree in Computer Science Engineering Information Systems or equivalent experience.

  • 58 years in Infrastructure/Platform/SRE/Observability roles for enterprise environments.

  • Expert handson Datadog: agents integrations logs pipelines APM/tracing (including OpenTelemetry) RUM synthetics dashboards monitors service catalogs tagging strategies.

  • ServiceNow: Event Management Incident/Problem/Change CMDB design Discovery integration patterns (webhooks APIs IntegrationHub) event correlation and enrichment.

  • Strong experience across Linux/Windows/Unix (cluster and workload monitoring).

  • Proficiency with scripting (Python/PowerShell/Bash) Datadog/ServiceNow APIs and Gitbased workflows.

  • Demonstrated capability to design SLOs/SLIs reduce false positives and measurably improve MTTR and service reliability.

  • Excellent communication; able to drive standards across multiple engineering teams.

Additional Qualifications

  • Experience across AWS/Azure/GCP Kubernetes Terraform

  • Prior ownership of enterprise observability programs (>500 nodes/services; multiaccount/multisubscription cloud).

  • Network (e.g. NPM/NTA) and database monitoring expertise (e.g. Postgres/SQL Server/Oracle/MySQL).

  • Experience with message brokers (Tibco) API gateways and distributed tracing for microservices.

  • Basic experience in administering and maintaining relational and/or non-relational databases.

  • Security/Compliance awareness (SOX HIPAA PCI) log retention/archival strategies.

  • Experience with cost governance in Datadog (metrics vs. logs vs. traces) custom metrics and sampling strategies.

  • ITIL v4 Foundation Datadog Certifications and ServiceNow Admin/Developer certifications.

Knowledge & Skills

  • Systems thinking reliability engineering mindset datadriven decision making.

  • Strong stakeholder collaboration (Infra AppDev SecOps NOC).

  • Documentation and enablement: clear runbooks patterns standards.

  • Bias for automation consistency and measurable outcomes.

Job -

Software

Schedule -

Full time

Shift -

No shift premium (India)

Travel -

Relocation -

Equal Opportunity Employer (EEO) -

HP Inc. provides equal employment opportunity to all employees and prospective employees without regard to race color religion sex national origin ancestry citizenship sexual orientation age disability or status as a protected veteran marital status familial status physical or mental disability medical condition pregnancy genetic predisposition or carrier status uniformed service status political affiliation or any other characteristic protected by applicable national federal state and local law(s).

Please be assured that you will not be subject to any adverse treatment if you choose to disclose the information requested. This information is provided voluntarily. The information obtained will be kept in strict confidence.

For more information review HPsEEO Policy or read about your rights as an applicant under the law here: Know Your Rights: Workplace Discrimination is Illegal

Datadog Administration and Operations (Servicenow)Description -Were seeking a Datadog administration and operations expert who will be responsible for managing our observability platform to ensure comprehensive monitoring alerting and performance analytics across infrastructure and applications. Thi...