Software Engineer, AI Platform

Job Location:

Mountain View, CA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Department:

Engineering

Job Summary

This role can be based in Mountain View CA San Francisco CA or Bellevue WA.

Join us to push the boundaries of scaling large models together. The team is responsible for scaling LinkedIns AI model training feature engineering and serving with hundreds of billions of parameters models and large scale feature engineering infra for all AI use cases from recommendation models large language models to computer vision models. We optimize performance across algorithms AI frameworks data infra compute software and hardware to harness the power of our GPU fleet with thousands of latest GPU cards. The team also works closely with the open source community and has many open source committers (TensorFlow Horovod Ray vLLM Hugginface DeepSpeed etc.) in the team. Additionally this team focussed on technologies like LLMs GNNs Incremental Learning Online Learning and Serving performance optimizations across billions of user queries.

Model Training Infrastructure: As an engineer on the AI Training Infra team you will play a crucial role in building the next-gen training infrastructure to power AI use cases. You will design and implement high performance data I/O work with open source teams to identify and resolve issues in popular libraries like Huggingface Horovod and PyTorch enable distributed training over 100s of billions of parameter models debug and optimize deep learning training and provide advanced support for internal AI teams in areas like model parallelism tensor parallelism Zero etc. Finally you will assist in and guide the development of containerized pipeline orchestration infrastructure including developing and distributing stable base container images providing advanced profiling and observability and updating internally maintained versions of deep learning frameworks and their companion libraries like Tensorflow PyTorch DeepSpeed GNNs Flash Attention. PyTorch Lightning and more and more.

Feature Engineering: this team shapes the future of AI with the state-of-the-art Feature Platform which empowers AI Users to effortlessly create compute store consume monitor and govern features within online offline and nearline environments optimizing the process for model training and serving. As an engineer in the team you will explore and innovate within the online offline and nearline spaces at scale (millions of QPS multi terabytes of data etc) developing and refining the infrastructure necessary to transform raw data into valuable feature insights. Utilizing leading open-source technologies like Spark Beam and Flink and more you will play a crucial role in processing and structuring feature data ensuring its most optimal storage in the Feature Store and serving feature data with high performance.

Model Serving Infrastructure: this team builds low latency high performance applications serving very large & complex models across LLM and Personalization models. As an engineer you will build compute efficient infra on top of native cloud enable GPU based inference for a large variety of use cases cuda level optimizations for high performance enable on-device and online training. Challenges include scale (10s of thousands of QPS multiple terabytes of data billions of model parameters) agility (experiment with hundreds of new ML models per quarter using thousands of features) and enabling GPU inference at scale.

ML Ops: The MLOps and Experimentation team is responsible for the infrastructure that runs MLOps and experimentation systems across LinkedIn. From Ramping to Observability this org powers the AI products that define LinkedIn. This team inside MLOps is responsible for AI Metadata Observability Orchestration Ramping and Experimentation for all models; building tools that enable our product and infrastructure engineers to optimize their models and deliver the best performance possible.

As a Software Engineer you will have first-hand opportunities to advance one of the most scalable AI platforms in the world. At the same time you will work together with our talented teams of researchers and engineers to build your career and your personal brand in the AI industry.

Responsibilities

Designing implementing and optimizing the performance of large-scale distributed serving or training for personalized recommendation as well as large language models.
Improving the observability and understandability of various systems with a focus on improving developer productivity and system sustenance.
Partner with peers leads and partners to define scope prioritize and build impactful features at a high velocity.

Qualifications :

Basic Qualifications

Bachelors Degree in Computer Science or related technical discipline or equivalent practical experience
1 years of experience in the industry with leading/ building deep learning systems.
Experience with Java C Python Go Rust C# and/or Functional languages such as Scala or other relevant coding languages
Experience qualifications in Machine Learning AI

Preferred Qualifications

2 years of relevant work experience
MS or PhD in Computer Science or related technical discipline
Experience building ML applications LLM serving GPU serving.
Experience with search systems or similar large-scale distributed systems
Experience with distributed data processing engines like Flink Beam Spark etc. feature engineering
Experience in distributed machine learning training infrastructure including technologies like Horovod PyTorch FSDP DeepSpeed Hugginface PyTorch Lightning LLMs GNNs MLFlow Kubeflow and large scale distributed systems
Familiarity with containers and container orchestration systems like Kubernetes
Experience in deep learning frameworks and tensor libraries like PyTorch Tensorflow JAX/FLAX

Suggested Skills

ML Algorithm Development
Experience in Machine Learning and Deep Learning
Distributed Systems

You will Benefit from our Culture

We strongly believe in the well-being of our employees and their families. That is why we offer generous health and wellness programs and time away for employees of all levels.

LinkedIn is committed to fair and equitable compensation practices. The pay range for this role is $114000 - $189000. Actual compensation packages are based on several factors that are unique to each candidate including but not limited to skill set depth of experience certifications and specific work location. This may be different in other locations due to differences in the cost of labor.

The total compensation package for this position may also include annual performance bonus stock benefits and/or other applicable incentive compensation plans. For more information visit Information :

Equal Opportunity Statement

We seek candidates with a wide range of perspectives and backgrounds and we are proud to be an equal opportunity employer. LinkedIn considers qualified applicants without regard to race color religion creed gender national origin age disability veteran status marital status pregnancy sex gender expression or identity sexual orientation citizenship or any other legally protected class.

LinkedIn is committed to offering an inclusive and accessible experience for all job seekers including individuals with disabilities. Our goal is to foster an inclusive and accessible workplace where everyone has the opportunity to be successful.

If you need a reasonable accommodation to search for a job opening apply for a position or participate in the interview process connect with us at and describe the specific accommodation requested for a disability-related limitation.

Reasonable accommodations are modifications or adjustments to the application or hiring process that would enable you to fully participate in that process. Examples of reasonable accommodations include but are not limited to:

Documents in alternate formats or read aloud to you
Having interviews in an accessible location
Being accompanied by a service dog
Having a sign language interpreter present for the interview

A request for an accommodation will be responded to within three business days. However non-disability related requests such as following up on an application will not receive a response.

LinkedIn will not discharge or in any other manner discriminate against employees or applicants because they have inquired about discussed or disclosed their own pay or the pay of another employee or applicant. However employees who have access to the compensation information of other employees or applicants as a part of their essential job functions cannot disclose the pay of other employees or applicants to individuals who do not otherwise have access to compensation information unless the disclosure is (a) in response to a formal complaint or charge (b) in furtherance of an investigation proceeding hearing or action including an investigation conducted by LinkedIn or (c) consistent with LinkedIns legal duty to furnish information.

San Francisco Fair Chance Ordinance

Pursuant to the San Francisco Fair Chance Ordinance LinkedIn will consider for employment qualified applicants with arrest and conviction records.

Pay Transparency Policy Statement

As a federal contractor LinkedIn follows the Pay Transparency and non-discrimination provisions described at this link: Data Privacy Notice for Job Candidates

Please follow this link to access the document that provides transparency around the way in which LinkedIn handles personal data of employees and job applicants: Work :

Employment Type :

Full-time

This role can be based in Mountain View CA San Francisco CA or Bellevue WA.Join us to push the boundaries of scaling large models together. The team is responsible for scaling LinkedIns AI model training feature engineering and serving with hundreds of billions of parameters models and large scale f...