Software Development Engineer, ML Systems Integration, Machine Learning Israel (MLIL) — Integration Validation

Amazon

Not Interested
Bookmark
Report This Job

profile Job Location:

Tel Aviv - Israel

profile Monthly Salary: Not Disclosed
Posted on: 14 hours ago
Vacancies: 1 Vacancy

Department:

Software Development

Job Summary

Annapurna Labs designs silicon and software that accelerates innovation. Our custom chips accelerators and software stacks enable us to take on technical challenges that have never been seen before and deliver results that help our customers change the world.
The Integration team is looking for a Senior Software Development Engineer to lead the design and delivery of systems software for our next-generation ML accelerator servers.. In this role you will own the design and implementation of CI/CD pipelines test frameworks and system-level validation for our next-generation ML inference accelerator platform. You will work across the full stack from firmware interfaces through data-plane performance benchmarking to production fleet readiness ensuring every component is validated end-to-end before it reaches customers.
This is a greenfield environment with rapidly growing scope: new silicon new software stacks (vLLM NKI NIXL) and new fleet-scale challenges. We are looking for a senior IC who can independently drive technical decisions scale our validation infrastructure and raise the bar on engineering quality across the group.


Key job responsibilities
-Own and evolve CI/CD pipelines from pre-merge gates through continuous deployment to fleet.
-Design and implement test frameworks that enable firmware and data-plane developers to write run and maintain tests with minimal friction.
-Architect system-level test suites that stress control-plane and data-plane components beyond provisioning and vetting flows.
-Build and maintain performance benchmarking infrastructure for LLM inference workloads (Prefill Decode) including dashboarding and regression detection.
-Drive integration of third-party vendor code (nightly drops) into CI/CD ensuring quality gates catch regressions early.
-Participate in feature design reviews contributing test plans and challenging coverage gaps.
-Define and own Continuous Testing in production environments (CTS).
-Leverage AI-assisted development tools (Kiro LLM-based code generation) to accelerate team velocity and pioneer new engineering workflows.


A day in the life
Youll start your day reviewing CI pipeline results from overnight runs triaging failures to determine whether a regression came from a vendor code drop a firmware change or an ML serving stack update. Mid-morning you might pair with a hardware engineer to design test cases for a new bus-level reset flow then pivot to extending the performance benchmarking framework to catch a latency regression. After lunch youll join a feature design review challenging test coverage gaps and deciding where system-level validation needs to live. The rest of your afternoon could be spent writing a new pipeline stage that gates deployment on accuracy checks or building a dashboard that gives the group visibility into fleet-readiness metrics. Throughout the day youll lean on AI-assisted development tools to accelerate everything from infrastructure code to root-cause analysis.

- Experience as a mentor tech lead or leading an engineering team
- Experience leading the architecture and design (architecture design patterns reliability and scaling) of new and current systems
- Knowledge of Python and/or C programming
- Experience programming with at least one modern language such as Java C or C# including object-oriented design
- Experience building test automation frameworks and tools
- Proven experience designing and operating CI/CD systems at scale (any platform Jenkins GitHub Actions internal equivalents).
- Demonstrated early adopter of AI-assisted development tools uses LLMs code-generation agents or similar tools as a core part of daily workflow.
- Strong Linux systems knowledge.

- Bachelors degree in computer science or equivalent
- Experience with AWS Services including EC2 Lambda S3 DynamoDB SQS
- Experience with hardware/software integration and real-time systems
- Familiarity with ML inference serving stacks (vLLM TensorRT-LLM Triton or similar).
- Knowledge of Amazon internal tooling (Brazil Pipelines Apollo ToD).
- Experience with performance benchmarking and profiling of GPU/accelerator workloads.
- Track record of leading technical initiatives across multiple teams.
- Experience with fleet-scale operations monitoring dashboarding incident response.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process including support for the interview or onboarding process please visit for more information. If the country/region youre applying in isnt listed please contact your Recruiting Partner.


Required Experience:

IC

Annapurna Labs designs silicon and software that accelerates innovation. Our custom chips accelerators and software stacks enable us to take on technical challenges that have never been seen before and deliver results that help our customers change the world.The Integration team is looking for a Sen...
View more view more

About Company

Company Logo

Free shipping on millions of items. Get the best of Shopping and Entertainment with Prime. Enjoy low prices and great deals on the largest selection of everyday essentials and other products, including fashion, home, beauty, electronics, Alexa Devices, sporting goods, toys, automotive ... View more

View Profile View Profile