drjobs Software Development Engineer SageMaker HyperPod Data Plane

Software Development Engineer SageMaker HyperPod Data Plane

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Santa Clara - USA

Yearly Salary drjobs

$ 129300 - 223600

Vacancy

1 Vacancy

Job Description

At AWS AI we want to make it easy for our customers to train their deep learning workload in the cloud. With Amazon SageMaker AWS is building customerfacing services to empower data scientists and software engineers in their deep learning endeavors. As our customers rapidly adopt LLMs and Generative AI for their business were building the nextgeneration AI platform to accelerate their development. Were seeking a dedicated engineering team lead to drive building our nextgeneration AI compute platform thats optimized for LLMs and distributed training.

As an SDE you will be responsible for designing developing testing and deploying distributed machine learning systems and largescale solutions for our worldwide customer base. In this you will collaborate closely with a team of ML scientists and customers to influence our overall strategy and define the teams roadmap. Youll assist in gathering and analyzing business and functional requirements and translate requirements into technical specifications for robust scalable supportable solutions that work well within the overall system architecture. You will also drive the system architecture spearhead best practices that enable a quality product and help coach and develop junior engineers. A successful candidate will have an established background in engineering large scale software systems a strong technical ability great communication skills and a motivation to achieve results in a fast paced environment.

About You:
You are passionate about building platform and products for large scale deep learning model training 100 billion parameter GPT 1000s of GPU devices). You have a proven track record of bringing innovative research to customers. You are able to thrive and succeed in an entrepreneurial environment and not be hindered by ambiguity or competing priorities. Ownership delivering results thinking big and analytical leadership are essential to success in this role.

You have solid experience in multithreaded asynchronous C/Go development. You have prior experience in resource orchestrators with kubernetes high performance computing building scalable systems experience in large language model training.

This is a great team to come to have a huge impact on AWS and the worlds customers we serve!


Key job responsibilities
As a Software Development Engineer in the SageMaker team you will be responsible for:
Developing innovative solutions for supporting Large Language Model training in a cluster of nodes;
Develop and maintain a performant resilient and fullymanaged service built to train largescale foundation models.
Optimizing distributed training by profiling identifying bottlenecks and addressing them by improving compute and network performance as well as finding opportunities for better compute/communication overlap;
You will serve as a key technical resource in the full development cycle from conception to delivery and maintenance.
You will own delivery of entire piece of the system and serve as technical lead on complex projects using best practice engineering standards
Hire/mentor junior development engineers

A day in the life
Every day will bring new and exciting challenges on the job while you:

* Build and improve nextgeneration AI platform using Kubernetes as orchestration layer.
* Collaborate with internal engineering teams leading technology companies around the world and open source community PyTorch NVIDIA/GPU
* Create innovative products to run at scale on the AI platform and see them launched in high volume production



About the team
Inclusive Team Culture

Here at AWS we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employeeled affinity groups reaching 40000 employees in over 190 chapters globally. We have innovative benefit offerings and host annual and ongoing learning experiences including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazons culture of inclusion is reinforced within our 14 Leadership Principles which remind team members to seek diverse perspectives learn and be curious and earn trust.

Work/Life Balance

Our team puts a high value on worklife balance. It isnt about how many hours you spend at home or at work; its about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to lifelong happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.

Mentorship & Career Growth
Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures and were building an environment that celebrates knowledge sharing and mentorship.

3 years of noninternship professional software development experience
2 years of noninternship design or architecture (design patterns reliability and scaling) of new and existing systems experience
Experience programming with at least one software programming language

3 years of full software development life cycle including coding standards code reviews source control management build processes testing and operations experience
Masters degree in computer science or equivalent
Experience contributing to the architecture and design (architecture design patterns reliability and scaling) of new and current systems
Experience in machine learning data mining information retrieval statistics or natural language processing
Experience with deep learning frameworks and libraries (any of PyTorch TensorFlow Huggingface etc.

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status disability or other legally protected status.

Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees supervisors and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees supervisors and staff to ensure exceptional customer service; and follow all federal state and local laws and Company policies. Criminal history may have a direct adverse and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above as well as the abilities to adhere to company policies exercise sound judgment effectively manage stress and work safely and respectfully with others exhibit trustworthiness and professionalism and safeguard business operations and the Companys reputation. Pursuant to the Los Angeles County Fair Chance Ordinance we will consider for employment qualified applicants with arrest and conviction records.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process including support for the interview or onboarding process please visit for more information. If the country/region youre applying in isnt listed please contact your Recruiting Partner.

Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $129300/year in our lowest geographic market up to $223600/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on jobrelated knowledge skills and experience. Amazon is a total compensation company. Dependent on the position offered equity signon payments and other forms of compensation may be provided as part of a total compensation package in addition to a full range of medical financial and/or other benefits. For more information please visit
This position will remain posted until filled. Applicants should apply via our internal or external career site.

Employment Type

Full-Time

Department / Functional Area

Software Development

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.