Software Engineer (Large Scale Training)

Lightricks

Job Location:

West Jerusalem - Israel

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Who we are

Lightricks is an AI-first company creating next-generation content creation technology for businesses enterprises and studios with a mission to bridge the gap between imagination and creation. At our core is LTX-2 an open-source generative video model built to deliver expressive high-fidelity video at unmatched speed. It powers both our own products and a growing ecosystem of partners through API integration.

The company is also known globally for pioneering consumer creativity through products like Facetune one of the worlds most recognized creative brands which helped introduce AI-powered visual expression to hundreds of millions of users worldwide. We combine deep research user-first design and end-to-end execution from concept to final render to bring the future of expression to all.

About the Role

This is a software engineering role on an ML team. Youll own the systems that make large-scale model training fast reliable and pleasant to work with the distributed training framework the data pipelines feeding it the performance characteristics of every step on the critical path and the day-to-day developer experience for the researchers who depend on it.

You dont need to come in as an ML expert. You do need to be a strong engineer who gets excited about hard systems problems: squeezing throughput out of accelerator clusters hunting down stragglers across hundreds of machines designing abstractions that hold up as the codebase grows and making the unglamorous parts of training infrastructure work well.

If youve ever looked at a large-scale system and thought theres no reason this should take this slow / inefficient / hard to maintain / complex this role is built for you.

Key Responsibilities

Build and maintain the distributed training framework: orchestration checkpointing fault tolerance observability and the ergonomics researchers interact with daily.
Profile end-to-end training runs and eliminate bottlenecks wherever they live- compute memory interconnect storage or the data pipeline.
Collaborate with researchers to translate model ideas into training code that runs efficiently and flag when an architectural choice will be expensive before it ships.
Own a shared codebase the team relies on: correctness readability testing and long-term maintainability matter as much as the benchmark numbers.
Work close to the metal where it pays off- write or integrate custom GPU kernels tune collective communication and exploit hardware features that off-the-shelf frameworks leave on the table.

Your skills and experience

2 years of professional software engineering experience ideally including work on performance-sensitive or distributed systems.
Strong software engineering fundamentals. You write clean tested maintainable Python and youre comfortable reading and writing modern C.
Real experience with performance work- profiling optimization and reasoning about systems where latency throughput and resource contention actually matter.
Comfort with distributed systems: youve debugged things that only break at scale and have intuitions for where they tend to go wrong.
A bias toward understanding systems end-to-end rather than treating any layer as a black box.
Familiarity with Kubernetes or similar environments for running and scaling large workloads.

* ML training experience is a bonus. If you have it great but wed rather hire a strong systems engineer whos curious about ML than an ML engineer whos lukewarm about infrastructure.

Nice to have

Working knowledge of at least one accelerator architecture (GPU TPU or similar) or a clear track record of going deep on hardware when the problem calls for it.
Experience with JAX/Pallas Triton CUDA OpenCL Metal or similar accelerator programming.
Prior exposure to ML training pipelines even informally- pet projects count.

Why Join Us

Were here to push the boundaries of whats possible with AI and video - not for the buzz but for the craft the challenge and the chance to make something genuinely new.
We believe in an environment where people are encouraged to think create and explore. Real impact happens when people are empowered to experiment evolve and elevate together. At Lightricks every breakthrough starts with great people and a collaborative mindset. If youre looking for a place that combines deep tech creative energy and zero buzzword culture you might be in the right place.

We got you covered:

We run daily door-to-door shuttles offering Car-to-go subscriptions for several locations in central Israel plus free parking and train-station pickups.
Were proud to have 2 chef-led restaurants on site by the legendary Machneyuda Group (yes that Machneyuda!) plus a bakery nestled in the heart of our office filled daily with the scent of fresh pastries.
We empower employees with cutting-edge tools and learning opportunities to grow and succeed through workshops access and training on platforms subscriptions and clear guidelines for responsible AI use.

Required Experience:

Who we areLightricks is an AI-first company creating next-generation content creation technology for businesses enterprises and studios with a mission to bridge the gap between imagination and creation. At our core is LTX-2 an open-source generative video model built to deliver expressive high-fidel...

Who we are

About the Role

If youve ever looked at a large-scale system and thought theres no reason this should take this slow / inefficient / hard to maintain / complex this role is built for you.

Key Responsibilities

Build and maintain the distributed training framework: orchestration checkpointing fault tolerance observability and the ergonomics researchers interact with daily.
Profile end-to-end training runs and eliminate bottlenecks wherever they live- compute memory interconnect storage or the data pipeline.
Collaborate with researchers to translate model ideas into training code that runs efficiently and flag when an architectural choice will be expensive before it ships.
Own a shared codebase the team relies on: correctness readability testing and long-term maintainability matter as much as the benchmark numbers.
Work close to the metal where it pays off- write or integrate custom GPU kernels tune collective communication and exploit hardware features that off-the-shelf frameworks leave on the table.

Your skills and experience

2 years of professional software engineering experience ideally including work on performance-sensitive or distributed systems.
Strong software engineering fundamentals. You write clean tested maintainable Python and youre comfortable reading and writing modern C.
Real experience with performance work- profiling optimization and reasoning about systems where latency throughput and resource contention actually matter.
Comfort with distributed systems: youve debugged things that only break at scale and have intuitions for where they tend to go wrong.
A bias toward understanding systems end-to-end rather than treating any layer as a black box.
Familiarity with Kubernetes or similar environments for running and scaling large workloads.

* ML training experience is a bonus. If you have it great but wed rather hire a strong systems engineer whos curious about ML than an ML engineer whos lukewarm about infrastructure.

Nice to have

Working knowledge of at least one accelerator architecture (GPU TPU or similar) or a clear track record of going deep on hardware when the problem calls for it.
Experience with JAX/Pallas Triton CUDA OpenCL Metal or similar accelerator programming.
Prior exposure to ML training pipelines even informally- pet projects count.

Why Join Us

We got you covered:

We run daily door-to-door shuttles offering Car-to-go subscriptions for several locations in central Israel plus free parking and train-station pickups.
Were proud to have 2 chef-led restaurants on site by the legendary Machneyuda Group (yes that Machneyuda!) plus a bakery nestled in the heart of our office filled daily with the scent of fresh pastries.
We empower employees with cutting-edge tools and learning opportunities to grow and succeed through workshops access and training on platforms subscriptions and clear guidelines for responsible AI use.

Required Experience:

Apply Now

About Company

Lightricks

Lightricks was founded in 2013 by five entrepreneurs from the Hebrew University of Jerusalem, four of them CS PhD students. Based in Israel with offices in London and Germany, the company has grown from five founders to a team of over 300. Lightricks is fast-becoming the go-to creat ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Software Engineer (Large Scale Training)

West Jerusalem - Israel

Job Summary

Who we are

About the Role

Key Responsibilities

Your skills and experience

Why Join Us

We got you covered:

Who we are

About the Role

Key Responsibilities

Your skills and experience

Why Join Us

We got you covered:

About Company

Related Jobs