Machine Learning Engineer, Training Infrastructure

San Francisco, CA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

About Hedra

Hedra is a pioneering generative media company backed by top investors at Index A16Z and Abstract Ventures. Were building Hedra Studio a multimodal creation platform capable of control emotion and creative intelligence.

At the core of Hedra Studio is our Character-3 foundation model the first omnimodal model in production. Character-3 jointly reasons across image text and audio for more intelligent video generation its the next evolution of AI-driven content creation.

At Hedra were a team of hard-working passionate individuals seeking to fundamentally change content creation and build a generational company together. We value startup energy initiative and the ability to turn bold ideas into real products. Our team is fully in-person in SF/NY with a shared love for whiteboard problem-solving.

Overview

We are looking for an ML Engineer with 3 YOE in high-performance computing systems to manage and optimize our computational infrastructure for training and deploying our machine learning models. The ideal candidate has diverse experience managing ML workloads at scale supporting our 3DVAE and video diffusion models. We encourage you to apply even if you dont meet every requirement we value curiosity creativity and the drive to solve hard problems.

Responsibilities

Design implement and maintain scalable computing solutions for training and deploying ML models ensuring infrastructure can handle large video datasets.
Manage and optimize the performance of our computing clusters or cloud instances such as AWS or Google Cloud to support distributed training.
Ensure that our infrastructure can handle the resource-intensive tasks associated with training large generative models.
Monitor system performance and implement improvements to maximize efficiency and utilization using tools like Airflow for orchestration.
Collaborate across research teams to understand their computational needs and provide appropriate solutions facilitating seamless model deployment.

Qualifications

Bachelors degree in Computer Science Information Technology or a related field with a focus on system administration.
Experience with cloud computing platforms such as Amazon Web Services Google Cloud or Microsoft Azure essential for managing large-scale ML workloads.
Values engineering processes and version control (CI/CD).
Knowledge of containerization technologies like Docker and Kubernetes required for deployments at scale.
Understanding of distributed training techniques and how to scale models across multi-node clusters aligning with video generation needs.
Strong problem-solving and communication skills given the need to collaborate with diverse teams.

This role is vital for ensuring the computational backbone supports the companys ML efforts focusing on deployment and scalability.

Benefits

Competitive compensation equity
401k (no match)
Healthcare (Silver PPO Medical Vision Dental)
Lunch and snacks at the office

About HedraHedra is a pioneering generative media company backed by top investors at Index A16Z and Abstract Ventures. Were building Hedra Studio a multimodal creation platform capable of control emotion and creative intelligence.At the core of Hedra Studio is our Character-3 foundation model the fi...