About Our Internship Program
Zooxs internship program offers hands-on experience with cutting-edge technology mentorship from some of the industrys brightest minds and the opportunity to make meaningful contributions to real projects. We seek interns who demonstrate strong academic performance engagement beyond the classroom intellectual curiosity and a genuine interest in Zooxs mission.
Project Overview
During this internship you will lead the development of a multi-modality (vision LiDAR Radar and language) temporal foundation encoder to support 3D object detection & tracking 3D segmentation (occupancy) and live maps. This Multi-Modal Foundation Encoder (MMFE) is a critical key to achieving End-to-End Perception at Zoox.
Your research will aim to significantly improve system performance on long-tail events and rare classes by utilizing a large-capacity foundation model to learn rich representations across different sensor modalities. Additionally the project aims to improve perception in adverse environmental conditions (such as medium to heavy rain and fog reducing false positives on water splashes or dust particles) achieve long-range sensing for highway driving and build robustness to occlusion.
This is a highly research-driven role with the goal of publication. You will have the opportunity to explore novel directions such as tri-modal foundation models with self-supervised pre-training radar-language grounding for zero-shot detection efficient sensor fusion via sparse cross-attention or integrating 3D Gaussian Splats for dynamic agent geometry and streaming sparse Gaussian occupancy prediction.
Requirements:
Currently working towards a Ph.D. or advanced degree in a relevant engineering program
Good academic standing
Able to commit to a 12-week internship during one of the following summer 2026 cohorts: May 18th - August 7th OR May 26th - August 14th OR June 15th - September 4th
At least one previous industry internship co-op or project completed in a relevant area
Ability to relocate to the Bay Area California (or Boston Massachusetts) for the duration of the internship
Interns at Zoox may not use any proprietary information they are working on as part of their thesis any published work with their university or to be distributed to anyone outside of Zoox
Qualifications (Its helpful if you meet a majority of the following qualifications but it isnt a requirement):
Currently enrolled in a Ph.D. program in Computer Science Electrical/Computer Engineering Robotics or a related field with a focus on Deep Learning Computer Vision or Autonomous Driving.
Publication record in top-tier AI/Robotics conferences (e.g. CVPR ICCV ECCV NeurIPS ICLR ICRA).
Prior experience designing and training foundation models (such as World Models VLMs LLMs or VLAs) using large-scale multi-modal autonomous driving datasets.
Hands-on experience developing deep learning models for 3D object detection tracking or 3D segmentation.
Experience working with multi-modal sensor data specifically combining representations from vision LiDAR Radar or language.
Bonus Qualifications
Experience with 4D radar object detection or handling sensor data in adverse weather conditions.
Experience with cross-modal alignment for zero-shot detection.
Familiarity with 3D Gaussian Splatting voxel grid representations or streaming sparse occupancy prediction.
Experience with self-supervised pre-training or masked sensor modeling.
Compensation:
The monthly salary for this position is $9500. Compensation will vary based on geographic location. Additional benefits may include medical insurance and a housing stipend (relocation assistance will be offered based on eligibility).
We may use artificial intelligence (AI) tools to support parts of the hiring process such as reviewing applications analyzing resumes or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed please contact us.
Required Experience:
Intern
About Our Internship ProgramZooxs internship program offers hands-on experience with cutting-edge technology mentorship from some of the industrys brightest minds and the opportunity to make meaningful contributions to real projects. We seek interns who demonstrate strong academic performance engage...
About Our Internship Program
Zooxs internship program offers hands-on experience with cutting-edge technology mentorship from some of the industrys brightest minds and the opportunity to make meaningful contributions to real projects. We seek interns who demonstrate strong academic performance engagement beyond the classroom intellectual curiosity and a genuine interest in Zooxs mission.
Project Overview
During this internship you will lead the development of a multi-modality (vision LiDAR Radar and language) temporal foundation encoder to support 3D object detection & tracking 3D segmentation (occupancy) and live maps. This Multi-Modal Foundation Encoder (MMFE) is a critical key to achieving End-to-End Perception at Zoox.
Your research will aim to significantly improve system performance on long-tail events and rare classes by utilizing a large-capacity foundation model to learn rich representations across different sensor modalities. Additionally the project aims to improve perception in adverse environmental conditions (such as medium to heavy rain and fog reducing false positives on water splashes or dust particles) achieve long-range sensing for highway driving and build robustness to occlusion.
This is a highly research-driven role with the goal of publication. You will have the opportunity to explore novel directions such as tri-modal foundation models with self-supervised pre-training radar-language grounding for zero-shot detection efficient sensor fusion via sparse cross-attention or integrating 3D Gaussian Splats for dynamic agent geometry and streaming sparse Gaussian occupancy prediction.
Requirements:
Currently working towards a Ph.D. or advanced degree in a relevant engineering program
Good academic standing
Able to commit to a 12-week internship during one of the following summer 2026 cohorts: May 18th - August 7th OR May 26th - August 14th OR June 15th - September 4th
At least one previous industry internship co-op or project completed in a relevant area
Ability to relocate to the Bay Area California (or Boston Massachusetts) for the duration of the internship
Interns at Zoox may not use any proprietary information they are working on as part of their thesis any published work with their university or to be distributed to anyone outside of Zoox
Qualifications (Its helpful if you meet a majority of the following qualifications but it isnt a requirement):
Currently enrolled in a Ph.D. program in Computer Science Electrical/Computer Engineering Robotics or a related field with a focus on Deep Learning Computer Vision or Autonomous Driving.
Publication record in top-tier AI/Robotics conferences (e.g. CVPR ICCV ECCV NeurIPS ICLR ICRA).
Prior experience designing and training foundation models (such as World Models VLMs LLMs or VLAs) using large-scale multi-modal autonomous driving datasets.
Hands-on experience developing deep learning models for 3D object detection tracking or 3D segmentation.
Experience working with multi-modal sensor data specifically combining representations from vision LiDAR Radar or language.
Bonus Qualifications
Experience with 4D radar object detection or handling sensor data in adverse weather conditions.
Experience with cross-modal alignment for zero-shot detection.
Familiarity with 3D Gaussian Splatting voxel grid representations or streaming sparse occupancy prediction.
Experience with self-supervised pre-training or masked sensor modeling.
Compensation:
The monthly salary for this position is $9500. Compensation will vary based on geographic location. Additional benefits may include medical insurance and a housing stipend (relocation assistance will be offered based on eligibility).
We may use artificial intelligence (AI) tools to support parts of the hiring process such as reviewing applications analyzing resumes or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed please contact us.
Required Experience:
Intern
View more
View less