drjobs Senior Software Engineer – Training & Registry (AI Platform)

Senior Software Engineer – Training & Registry (AI Platform)

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Paris - France

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

At Datadog we are building a next-generation AI platform that enables seamless training tracking and deployment of ML and LLM models at scale. The Training & Registry team is responsible for the infrastructure and tooling that allows applied scientists to iterate rapidly and reliablymanaging training jobs tracking experimentation and versioning model artifacts across distributed systems.

Our work is foundational to AI development at Datadog powering everything from classic ML workflows to large-scale LLM fine-tuning and embedding generation. We build deeply technical infrastructure: distributed systems for job orchestration model lifecycle management and training observability. This is a high-impact team working on problems critical to Datadogs AI evolution.

Were looking for a Senior Software Engineer to design and build robust backend and platform systems that drive model experimentation and registry workflows. In this role youll collaborate with platform teams applied scientists and infra stakeholders to shape the future of AI infrastructure at Datadog.

At Datadog we place value in our office culture - the relationships that it builds the creativity it brings to the table and the collaboration of being together. We operate as a hybrid workplace to ensure our employees can create a work-life harmony that best fits them.

What Youll Do:

  • Design and implement scalable reliable systems for training orchestration artifact tracking and model registration across multiple data centers and cloud regions.
  • Improve and streamline ML experimentation workflows by integrating tooling like Ray Airflow and interactive notebooks.
  • Develop APIs and services that enable applied scientists to seamlessly launch debug and track training jobs.
  • Ensure reproducibility and traceability by building robust version control and metadata systems for model artifacts.
  • Collaborate with AI infra teams (LLMObs Compute etc.) to deliver consistent user experiences and integrated telemetry.
  • Mentor engineers and help drive architectural decisions and technical standards.

Who You Are:

  • You have 6 years of experience in backend distributed systems or platform engineering roles.
  • You have worked on ML platforms or infrastructure ideally supporting real-world training or model lifecycle workflows.
  • Youre comfortable designing APIs managing data at scale and architecting systems for reliability and observability.
  • Youre fluent in Python or Go and have experience with cloud-native tools (e.g. Kubernetes object stores queueing systems).
  • Youre comfortable navigating cross-functional environments and translating scientific requirements into reliable systems.
  • Bonus points: experience with model registries experiment tracking tools (e.g. MLflow Weights & Biases) or distributed training frameworks.

Datadog values people from all walks of life. We understand not everyone will meet all the above qualifications on day one. Thats okay. If youre passionate about technology and want to grow your skills we encourage you to apply.

Benefits and Growth:

  • New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
  • Continuous professional development product training and career pathing
  • Intradepartmental mentor and buddy program for in-house networking
  • An inclusive company culture ability to join our Community Guilds (Datadog employee resource groups)
  • Access to Inclusion Talks our internal panel discussions
  • Free global mental health benefits for employees and dependents age 6
  • Competitive global benefits

Benefits and Growth listed above may vary based on the country of your employment and the nature of your employment with Datadog.


Required Experience:

Senior IC

Employment Type

Full Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.