Senior AI Engineer APM Experiences

Datadog

Not Interested
Bookmark
Report This Job

profile Job Location:

Paris - France

profile Monthly Salary: Not Disclosed
Posted on: 10 hours ago
Vacancies: 1 Vacancy

Job Summary

Datadogs APM Experiences team owns the core product experience forApplication Performance Monitoring including distributed tracing service representation and more. Were building a new wave of AI-powered capabilities that help customers detect resolve and prevent performance issues this role you will lead endtoend development of LLM- and Agentbased features that can:

  • Debug and investigate application performance issues down to the root cause as both a developer assistant and a fully autonomous agent
  • Proactively recommend performance and reliability-based optimizations to prevent the next incident
  • Automatically create intelligent monitors and SLOs for the most important business flows and critical paths

This is a highly productminded engineering role: youll work from problem discovery and UX all the way to reliable scalable production systems.

What youll do:

  • Shape AI experiences for APM. Design and ship LLM/agentic workflows that analyze traces metrics logs and other telemetry to generate diagnoses explanations and guided fixes.
  • Own the full loop. Prototype quickly define success metrics and evals run experiments iterate and ultimately productionize for scale and reliability.
  • Build robust agent systems. Develop tools retrieval and planning strategies and guardrails; manage prompts/evals; design fallbacks and humanintheloop paths.
  • Integrate with Datadogs platform. Leverage surfaces like Trace Explorer Service Catalog monitors and workflows to deliver endtoend value in the APM UI.
  • Partner deeply. Collaborate with PM Design and partner teams to build cohesive experiences.
  • Raise the bar on engineering. Write performant maintainable backend code own services in production and improve reliability for highthroughput lowlatency data systems.

Who you are:

Productminded engineer who ships AI to production

  • 4 years building backend or real-time ML systems; you value simplicity correctness and performance
  • Proven experience delivering LLM/agent features to production (prompting tooling evals safety/guardrails)
  • Comfortable owning user journeys iterating from prototype alpha GA and measuring impact with clear product metrics

Strong ML / applied science fundamentals

  • Solid grasp of the ML lifecycle (task definition dataset collection modeling evaluation deployment iteration) and statistics (experiment design confidence intervals)
  • Experience choosing/modeling the right technique for the job (e.g. anomaly detection ranking/recommendation NLP) and knowing when a heuristic beats a model
  • Fluency with offline/online evals for AI systems; can build reliable golden sets and automatic regressions

Distributed systems & observability savvy

  • Experience with microservices performance: tracing latency breakdowns concurrency and resiliency patterns
  • Proficient in Go Java or Python; strong API/service design; production ops (monitoring alerting oncall rotation)

Nice to have

  • Handson with distributed tracing stacks (OpenTelemetry/Datadog APM) profilers and logs/metrics pipelines
  • Exposure to planning/agent frameworks tooluse orchestration RAG and retrieval/indexing for observability data
  • Familiarity with SLO/SLA practices and incident response

Benefits and Growth:

  • Get to build tools for software engineers just like yourself. And use the tools we build to accelerate our development.
  • Have a lot of influence on product direction and impact on the business.
  • Work with skilled knowledgeable and kind teammates who are happy to teach and learn.
  • Competitive global benefits.
  • Continuous professional development.

Benefits and Growth listed above may vary based on the country of your employment and the nature of your employment with Datadog.


Required Experience:

Senior IC

Datadogs APM Experiences team owns the core product experience forApplication Performance Monitoring including distributed tracing service representation and more. Were building a new wave of AI-powered capabilities that help customers detect resolve and prevent performance issues this role you wi...
View more view more

Key Skills

  • APIs
  • C/C++
  • Computer Graphics
  • Go
  • React
  • Redux
  • Node.js
  • AWS
  • Library Services
  • Assembly
  • GraphQL
  • High Voltage

About Company

Company Logo

See inside any stack, any app, at any scale, anywhere.

View Profile View Profile