Datadog has recently expanded its AI Research initiatives. Building on our proven track record of AIpowered solutions (e.g. Bits AI Watchdog and Toto our research team is tackling highrisk highreward projects grounded in realworld challenges in cloud observability and security.
We are currently focused on three key research areas:
- Observability Foundation Models Building stateoftheart models for advanced forecasting anomaly detection and multimodal telemetry analysis (logs metrics traces etc.. These models will also provide the foundation for our agents (described below) to natively analyze telemetry data.
- Site Reliability Engineering SRE) Autonomous Agents Creating AI agents to automatically detect diagnose and resolve incidents in production environments pushing the boundaries of multistep planning reasoning and domainspecific knowledge.
- Production Code Repair Agents Developing agents and models that leverage code logs runtime data and other signals to identify fix and even preempt performance issues and security vulnerabilities in production code.
As a researcher on our team you will help drive these effortsworking on fundamental research problems and collaborating with Datadogs Product and Engineering teams to help translate research advances into tangible benefits for our customers.
What Youll Do:
- Conduct cuttingedge research in Generative AI and Machine Learning aiming to build specialized Foundation Models and AI Agents for observability site reliability engineering and code repair
- Leverage largescale distributed training infrastructure to train and finetune stateoftheart models on diverse realworld telemetry data
- Lead and contribute to research publications present findings at toptier conferences (e.g. NeurIPS ICLR ICML) and help opensource key model artifacts and benchmarks
- Collaborate with crossfunctional teams (e.g. Product Engineering) to integrate advanced AI capabilitieslike multimodal analysis or automated incident resolution planninginto Datadogs product ecosystem
- Stay at the forefront of LLMs Foundation Models and Generative AI research and engage with the external research community
- Foster a culture of scientific rigor innovation and practical impact e.g. by actively participating in reading groups and mentoring interns
Who You Are:
- You hold a PhD in Computer Science Machine Learning or a related fieldwith deep expertise in areas like generative modeling AI agents reinforcement learning or natural language processing (or have equivalent experience)
- You possess extensive experience in designing and implementing deep learning models and have a strong background in distributed training frameworks (e.g. DeepSpeed MegatronLM) and ML libraries (PyTorch TensorFlow)
- You have a proven track record of conducting impactful research in the field with publications at toptier venues (e.g. NeurIPS ICLR ICML TMLR)
- Youre familiar with efficient training finetuning and inference techniques for large foundation models
- You excel at explaining complex models and research findings to both technical and nontechnical audiences
- You have strong interest in openscience and opensource contributions including establishing rigorous benchmarks and sharing research with the community
Bonus Points (any of the following):
- You have a demonstrated ability to bridge cuttingedge research and realworld product applicationsideally with an emphasis on large foundation models generative AI agents or domainspecific LLM deployments.
- Youre passionate about pushing the boundaries of AI while maintaining a strong focus on customer impact scalability and responsible deployment of new technologies
- You have handson experience with GPU programming and optimization including experience in CUDA
- You have experience writing production data pipelines and applications
Datadog values people from all walks of life. We understand not everyone will meet all the above qualifications on day one. Thats okay. If youre passionate about AI Research and want to grow your skills we encourage you to apply.
Benefits and Growth:
- Competitive global benefits
- New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
- Opportunity to collaborate closely with colleagues across the Datadog offices in New York City and Paris
- Opportunity to attend and present at conferences and meetups
- Intradepartmental mentor and buddy program for inhouse networking
- An inclusive company culture ability to join our Community Guilds (Datadog employee resource groups)
Benefits and Growth listed above may vary based on the country of your employment and the nature of your employment with Datadog.