At Toyota Research Institute (TRI) were on a mission to improve the quality of human life. Were developing new tools and capabilities to amplify the human experience. To lead this transformative shift in mobility weve built a world-class team in Automated Driving Energy & Materials Human-Centered AI Human Interactive Driving Large Behavior Models and Robotics.
The Mission
Make general-purpose robots a reality.
The Challenge
We envision a future where robots assist with household chores and cooking aid the elderly in maintaining their independence and enable people to spend more time on the activities they enjoy most. To achieve this robots must be able to operate reliably in complex unstructured environments. Our mission is to answer the question What will it take to create truly general-purpose robots that can accomplish a wide variety of tasks in settings like human homes with minimal human supervision We believe that the answer lies in cultivating large-scale datasets of physical interaction from a variety of sources and building on the latest advances in machine learning to learn general purpose robot behaviors from this data.
The Team
The Learning From Videos (LFV) team in the Robotics division focuses on the development of foundation models capable of leveraging large-scale multi-modal (RGB depth flow semantics bounding boxes tactile audio etc.) data from multiple domains (driving robotics indoors outdoors etc.) to improve the performance of downstream tasks. This paradigm targets training scalability since data from multiple modalities can be equally leveraged to learn useful data-driven priors (3D geometry physics dynamics etc) for world understanding. Our topics of interest include but are not limited to Video Generation World Models 4D Reconstruction Multi-Modal Models Multi-View Geometry Data Augmentation and Video-Language-Action models with a primary focus on foundation models for embodied applications. We are aiming to make progress on some of the hardest scientific challenges around spatio-temporal reasoning and how it can lead to the deployment of autonomous agents in real-world unstructured environments.
The Opportunity
Our Learning From Videos (LFV) team is looking for a Computer Vision Research Scientist with expertise in Video Generation Spatio-temporal Representation Learning World Models Foundation Models Multi-Modal Learning Vision-as-Inverse-Graphics (including Differentiable Rendering) or related fields to improve dynamic scene understanding for robots. We are working on some of the hardest scientific challenges around the safe and effective usage of large robotic fleets simulation and prior knowledge (geometry physics domain knowledge behavioral science) not only for automation but also for human augmentation.
As a Research Scientist you will work with a team proposing conducting and transferring innovative research. You will use large amounts of sensory data (real and synthetic) to address open problems train models at scale publish at top academic venues and test your ideas in the real world (including on our robots). You will also work closely with other teams at TRI to transfer and ship our most successful algorithms and models towards world-scale long-term autonomy and advanced assistance systems.
Responsibilities
- Conduct high-reaching research that solves problems of high value and validates them in well established benchmarks and systems.
- Push the boundaries of knowledge and the state of the art in ML areas including simulation perception prediction and planning for autonomous driving and robotics.
- Partner with a multidisciplinary team including other research scientists and engineers across the CV team TRI Toyota and our university partners.
- Present results in verbal and written communications internally at top international venues and via open-source contributions to the community.
- Work closely with robotics and machine learning researchers and engineers to understand theoretical and practical needs.
- Lead collaborations with our external research partners and mentor research interns.
- Follow best practices producing maintainable code both for internal use as well as for open-sourcing to the scientific community.
Qualifications
- PhD or equivalent years of experience in Machine Learning Robotics Computer Vision or a related field.
- Deep expertise in at least one key ML area among Computer Vision Large-Scale Pre-Training Multi-Modal Learning World Models 4D Reconstruction
- Consistent record of publishing at high-impact conferences/journals (CVPR ICLR NeurIPS RSS ICRA ICCV ECCV PAMI IJCV etc.) on the aforementioned topics.
- Proficient at scientific Python Unix and a common DL framework (preferably PyTorch). Experience with distributed learning (especially on AWS) for large-scale training of foundation models is a plus.
- You can identify propose and lead new research efforts working in collaboration with other researchers and engineers to complete it from initial idea to working solution.
- You are intrigued by large-scale challenges in ML especially in the space of Robotics Automated Driving and for societal good in general.
- You are a reliable teammate. You like to think big and go deeper. You care about openness and delivering with integrity.
Please submit a brief cover letter and add a link to Google Scholar to include a full list of publications when submitting your CV for this position.
The pay range for this position at commencement of employment is expected to be between $176000 and $264000/year for California-based roles. Base pay offered will depend on multiple individualized factors including but not limited to business or organizational needs market location job-related knowledge skills and experience. TRI offers a generous benefits package including medical dental and vision insurance 401(k) eligibility paid time off benefits (including vacation sick time and parental leave) and an annual cash bonus structure. Additional details regarding these benefit plans will be provided if an employee receives an offer of employment.
Please reference thisCandidate Privacy Noticeto inform you of the categories of personal information that we collect from individuals who inquire about and/or apply to work for Toyota Research Institute Inc. or its subsidiaries including Toyota A.I. Ventures GP L.P. and the purposes for which we use such personal information.
TRI is fueled by a diverse and inclusive community of people with unique backgrounds education and life experiences. We are dedicated to fostering an innovative and collaborative environment by living the values that are an essential part of our culture. We believe diversity makes us stronger and are proud to provide Equal Employment Opportunity for all without regard to an applicants race color creed gender gender identity or expression sexual orientation national origin age physical or mental disability medical condition religion marital status genetic information veteran status or any other status protected under federal state or local laws.
It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability. Pursuant to the San Francisco Fair Chance Ordinance we will consider qualified applicants with arrest and conviction records for employment.
We may use artificial intelligence (AI) tools to support parts of the hiring process such as reviewing applications analyzing resumes or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed please contact us.
At Toyota Research Institute (TRI) were on a mission to improve the quality of human life. Were developing new tools and capabilities to amplify the human experience. To lead this transformative shift in mobility weve built a world-class team in Automated Driving Energy & Materials Human-Centered AI...
At Toyota Research Institute (TRI) were on a mission to improve the quality of human life. Were developing new tools and capabilities to amplify the human experience. To lead this transformative shift in mobility weve built a world-class team in Automated Driving Energy & Materials Human-Centered AI Human Interactive Driving Large Behavior Models and Robotics.
The Mission
Make general-purpose robots a reality.
The Challenge
We envision a future where robots assist with household chores and cooking aid the elderly in maintaining their independence and enable people to spend more time on the activities they enjoy most. To achieve this robots must be able to operate reliably in complex unstructured environments. Our mission is to answer the question What will it take to create truly general-purpose robots that can accomplish a wide variety of tasks in settings like human homes with minimal human supervision We believe that the answer lies in cultivating large-scale datasets of physical interaction from a variety of sources and building on the latest advances in machine learning to learn general purpose robot behaviors from this data.
The Team
The Learning From Videos (LFV) team in the Robotics division focuses on the development of foundation models capable of leveraging large-scale multi-modal (RGB depth flow semantics bounding boxes tactile audio etc.) data from multiple domains (driving robotics indoors outdoors etc.) to improve the performance of downstream tasks. This paradigm targets training scalability since data from multiple modalities can be equally leveraged to learn useful data-driven priors (3D geometry physics dynamics etc) for world understanding. Our topics of interest include but are not limited to Video Generation World Models 4D Reconstruction Multi-Modal Models Multi-View Geometry Data Augmentation and Video-Language-Action models with a primary focus on foundation models for embodied applications. We are aiming to make progress on some of the hardest scientific challenges around spatio-temporal reasoning and how it can lead to the deployment of autonomous agents in real-world unstructured environments.
The Opportunity
Our Learning From Videos (LFV) team is looking for a Computer Vision Research Scientist with expertise in Video Generation Spatio-temporal Representation Learning World Models Foundation Models Multi-Modal Learning Vision-as-Inverse-Graphics (including Differentiable Rendering) or related fields to improve dynamic scene understanding for robots. We are working on some of the hardest scientific challenges around the safe and effective usage of large robotic fleets simulation and prior knowledge (geometry physics domain knowledge behavioral science) not only for automation but also for human augmentation.
As a Research Scientist you will work with a team proposing conducting and transferring innovative research. You will use large amounts of sensory data (real and synthetic) to address open problems train models at scale publish at top academic venues and test your ideas in the real world (including on our robots). You will also work closely with other teams at TRI to transfer and ship our most successful algorithms and models towards world-scale long-term autonomy and advanced assistance systems.
Responsibilities
- Conduct high-reaching research that solves problems of high value and validates them in well established benchmarks and systems.
- Push the boundaries of knowledge and the state of the art in ML areas including simulation perception prediction and planning for autonomous driving and robotics.
- Partner with a multidisciplinary team including other research scientists and engineers across the CV team TRI Toyota and our university partners.
- Present results in verbal and written communications internally at top international venues and via open-source contributions to the community.
- Work closely with robotics and machine learning researchers and engineers to understand theoretical and practical needs.
- Lead collaborations with our external research partners and mentor research interns.
- Follow best practices producing maintainable code both for internal use as well as for open-sourcing to the scientific community.
Qualifications
- PhD or equivalent years of experience in Machine Learning Robotics Computer Vision or a related field.
- Deep expertise in at least one key ML area among Computer Vision Large-Scale Pre-Training Multi-Modal Learning World Models 4D Reconstruction
- Consistent record of publishing at high-impact conferences/journals (CVPR ICLR NeurIPS RSS ICRA ICCV ECCV PAMI IJCV etc.) on the aforementioned topics.
- Proficient at scientific Python Unix and a common DL framework (preferably PyTorch). Experience with distributed learning (especially on AWS) for large-scale training of foundation models is a plus.
- You can identify propose and lead new research efforts working in collaboration with other researchers and engineers to complete it from initial idea to working solution.
- You are intrigued by large-scale challenges in ML especially in the space of Robotics Automated Driving and for societal good in general.
- You are a reliable teammate. You like to think big and go deeper. You care about openness and delivering with integrity.
Please submit a brief cover letter and add a link to Google Scholar to include a full list of publications when submitting your CV for this position.
The pay range for this position at commencement of employment is expected to be between $176000 and $264000/year for California-based roles. Base pay offered will depend on multiple individualized factors including but not limited to business or organizational needs market location job-related knowledge skills and experience. TRI offers a generous benefits package including medical dental and vision insurance 401(k) eligibility paid time off benefits (including vacation sick time and parental leave) and an annual cash bonus structure. Additional details regarding these benefit plans will be provided if an employee receives an offer of employment.
Please reference thisCandidate Privacy Noticeto inform you of the categories of personal information that we collect from individuals who inquire about and/or apply to work for Toyota Research Institute Inc. or its subsidiaries including Toyota A.I. Ventures GP L.P. and the purposes for which we use such personal information.
TRI is fueled by a diverse and inclusive community of people with unique backgrounds education and life experiences. We are dedicated to fostering an innovative and collaborative environment by living the values that are an essential part of our culture. We believe diversity makes us stronger and are proud to provide Equal Employment Opportunity for all without regard to an applicants race color creed gender gender identity or expression sexual orientation national origin age physical or mental disability medical condition religion marital status genetic information veteran status or any other status protected under federal state or local laws.
It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability. Pursuant to the San Francisco Fair Chance Ordinance we will consider qualified applicants with arrest and conviction records for employment.
We may use artificial intelligence (AI) tools to support parts of the hiring process such as reviewing applications analyzing resumes or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed please contact us.
View more
View less