Distillation Lead
Job Summary
You will
- Define and drive the technical strategy for model distillation and compression across Waabis AI stack spanning perception world models and planning with an eye toward both onboard deployment and simulation use-cases.
- Design implement and scale state-of-the-art distillation and efficiency pipelines which may include:
Distillation for generative models (diffusion autoregressive flow-matching video models)
Quantization-aware training (QAT) and post-training quantization (PTQ)
Knowledge distillation (feature-level response-based and relation-based)
Structured and unstructured pruning and sparsification
Low-rank factorization and efficient architecture design
Speculative decoding and other inference-time efficiency techniques
- Collaborate closely with ML Platform Infrastructure Onboard Autonomy and Simulation teams to integrate compressed models into production pipelines and meet latency memory and throughput targets across deployment contexts.
- Define rigorous benchmarks and evaluation frameworks to characterize efficiency vs. quality trade-offs across models and hardware targets.
- Mentor and guide researchers and engineers working in the distillation and model efficiency space setting a high technical bar and fostering a culture of rigorous experimentation.
- Champion best practices for model compression across the organization; disseminate knowledge through internal design reviews documentation and technical talks.
- Stay at the cutting edge of model efficiency research; contribute to the broader scientific community through publications and open-source contributions.
Qualifications:
- Deep distillation expertise:You have extensive hands-on experience designing and implementing distillation quantization pruning and model compression techniques for large-scale neural networks with demonstrated impact in production settings.
- Strong research and engineering foundation:A Bachelors or Masters degree in Machine Learning Computer Vision Robotics or a related field or equivalent industry experience; relevant hands-on experience in model distillation and efficiency is what matters most. Expert Python and PyTorch (or JAX) skills with experience in large-scale distributed training.
- Technical leadership:You have a proven track record of setting technical direction and driving projects from conception to production. You inspire and elevate those around you through deep technical expertise and mentorship.
- Cross-functional collaboration:You have experience working closely with infrastructure platform and autonomy teams to deploy compressed models under real engineering constraints.
- Clear communicator:You can communicate complex technical trade-offs clearly to diverse audiences and drive alignment across research and engineering teams.
Bonus:
- Experience with hardware-aware optimization (TensorRT ONNX custom CUDA kernels hardware-specific quantization).
- Publications at top-tier ML/CV venues (NeurIPS ICML CVPR ICLR ECCV) in model compression efficient deep learning or related areas.
- Experience distilling large generative models (diffusion models LLMs VLMs or video models).
- Background in autonomous vehicles or robotics.