About the Role/Specialty
As a Senior Machine Learning Systems Engineer youll lead efforts to scale and optimize the training system for our large-scale multimodal and foundation models. Youll design distributed training systems using Megatron-LM NVIDIA NeMo FSDP and Tritonpushing the limits of performance across compute memory and communication layers. Youll sit at the intersection of systems and AI research directly shaping how we train the models that will power Canvas next generation of products.
What youll do (responsibilities)
- Youll design implement and optimize large-scale machine learning systems for training and inference.
- Youll improve all aspects of performance including GPU utilization communication overhead and memory efficiency.
- Youll partner with research and modeling teams to align systems with algorithmic needs.
- Youll evaluate and apply best practices for distributed training using industry-leading frameworks.
- Youll dive deep into low-level optimization including custom CUDA or Triton kernels.
- Youll debug profile and fine-tune training workflows to unlock new levels of scalability.
Qualifications :
What were looking for
Were looking for a systems-first engineer who thrives in fast-paced high-impact environments. Youre deeply familiar with distributed model training at scale and understand the nuances of optimizing compute at every level of the stack. Youre excited by challenges that stretch current boundaries and youre a strong collaborator who communicates clearly across domains.
- Strong background in LLMs multimodal AI or diffusion models.
- Proficiency in Python. Familiarity with a system programming language (e.g. C or Rust) is a plus.
- Deep knowledge of PyTorch or JAX as well as libraries such as Megatron-LM NeMo or DeepSpeed.
- Familiarity with common optimization techniques such as FSDP/ZeRO gradient checkpointing or low-precision data types.
- Hands-on experience writing custom GPU kernels in CUDA or Triton.
- Excellent communication and problem-solving skills incl. full proficiency in English.
Remote Work :
Yes
Employment Type :
Full-time
About the Role/SpecialtyAs a Senior Machine Learning Systems Engineer youll lead efforts to scale and optimize the training system for our large-scale multimodal and foundation models. Youll design distributed training systems using Megatron-LM NVIDIA NeMo FSDP and Tritonpushing the limits of perfor...
About the Role/Specialty
As a Senior Machine Learning Systems Engineer youll lead efforts to scale and optimize the training system for our large-scale multimodal and foundation models. Youll design distributed training systems using Megatron-LM NVIDIA NeMo FSDP and Tritonpushing the limits of performance across compute memory and communication layers. Youll sit at the intersection of systems and AI research directly shaping how we train the models that will power Canvas next generation of products.
What youll do (responsibilities)
- Youll design implement and optimize large-scale machine learning systems for training and inference.
- Youll improve all aspects of performance including GPU utilization communication overhead and memory efficiency.
- Youll partner with research and modeling teams to align systems with algorithmic needs.
- Youll evaluate and apply best practices for distributed training using industry-leading frameworks.
- Youll dive deep into low-level optimization including custom CUDA or Triton kernels.
- Youll debug profile and fine-tune training workflows to unlock new levels of scalability.
Qualifications :
What were looking for
Were looking for a systems-first engineer who thrives in fast-paced high-impact environments. Youre deeply familiar with distributed model training at scale and understand the nuances of optimizing compute at every level of the stack. Youre excited by challenges that stretch current boundaries and youre a strong collaborator who communicates clearly across domains.
- Strong background in LLMs multimodal AI or diffusion models.
- Proficiency in Python. Familiarity with a system programming language (e.g. C or Rust) is a plus.
- Deep knowledge of PyTorch or JAX as well as libraries such as Megatron-LM NeMo or DeepSpeed.
- Familiarity with common optimization techniques such as FSDP/ZeRO gradient checkpointing or low-precision data types.
- Hands-on experience writing custom GPU kernels in CUDA or Triton.
- Excellent communication and problem-solving skills incl. full proficiency in English.
Remote Work :
Yes
Employment Type :
Full-time
View more
View less