VP AI Engineering — Pre-Training

Berlin - Germany

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

About the Role

Youve likely scaled systems most people only read about.

Were entering a phase where infrastructure decisions directly shape model capability not just cost or reliability. The next gains wont come from brute force alone but from better systems sharper tradeoffs and sound judgment under real constraints.

As VP AI Engineering (Pre-Training) youll define how large-scale training actually works here from compute strategy and data movement to iteration speed. This role is for someone who has been close to the metal seen what breaks at scale and knows how to design systems that hold up under pressure.

Infrastructure in this role isnt a support function. Its a first-order driver of model outcomes. Youll be accountable for whether training runs succeed stall or fail and for the decisions behind those outcomes.

This is a role for builders who have owned training systems end-to-end not for abstract platform or governance leadership.

What You Will Lead

You will own the entire pre-training foundation including:

Large-scale compute strategy (GPU clusters in the thousands)
Training orchestration throughput and efficiency
PB-scale data pipelines storage and data movement
Reliability and performance of long-running training workloads
Infrastructure decisions that directly impact model velocity and quality

This is not a keep the lights on role. Youll be expected to:

Make hard architectural calls with incomplete information
Decide where to invest where to simplify and where to say no
Work closely with modeling and post-training leaders to remove system-level constraints
Stay close enough to the systems to know when theory diverges from reality

Many of the decisions youll make here are difficult to reverse. Youll commit to architectures tooling and operating models that must hold up in production training runs measured in weeks not just in design documents.

Youll build a lean team around you but this role succeeds through clarity and leverage not headcount.

What You Bring

Youve likely done several of the following already:

Designed operated or materially evolved large GPU training clusters where you were accountable for throughput failure modes and iteration speed not just budget or vendor relationships
Built or scaled training infrastructure used by advanced ML teams in production
Worked with data pipelines measured in petabytes not terabytes
Balanced speed cost and reliability under real delivery pressure
Made architectural decisions that held up months or years later

Youve been close enough to the system to know where theoretical efficiency breaks down in practice and youve adjusted accordingly.

We care less about where you worked and more about what you personally owned especially when things didnt go to plan.

Why This Is Different

Fewer layers between decision and execution
Direct influence on model capability not just infrastructure metrics
A system still early enough to bend but serious enough to matter
A founder-led environment where clarity speed and judgment outweigh process
Meaningful equity and real ownership not cosmetic leadership

If youve ever wanted to apply everything youve learned without the inertia of a massive organization this is that moment.

What You Can Expect from Us

Become part of an AI revolution!
30 days of paid vacation
Access to a variety of fitness & wellness offerings via Wellhub
Mental health support through
JobRad Bike Lease
Substantially subsidized company pension plan for your future security
Subsidized Germany-wide transportation ticket
Budget for additional technical equipment
Flexible working hours for better work-life balance and hybrid working model
Virtual Stock Option Plan

Required Experience:

Exec

About the RoleYouve likely scaled systems most people only read about.Were entering a phase where infrastructure decisions directly shape model capability not just cost or reliability. The next gains wont come from brute force alone but from better systems sharper tradeoffs and sound judgment under...

About the Role

Youve likely scaled systems most people only read about.

This is a role for builders who have owned training systems end-to-end not for abstract platform or governance leadership.

What You Will Lead

You will own the entire pre-training foundation including:

Large-scale compute strategy (GPU clusters in the thousands)
Training orchestration throughput and efficiency
PB-scale data pipelines storage and data movement
Reliability and performance of long-running training workloads
Infrastructure decisions that directly impact model velocity and quality

This is not a keep the lights on role. Youll be expected to:

Make hard architectural calls with incomplete information
Decide where to invest where to simplify and where to say no
Work closely with modeling and post-training leaders to remove system-level constraints
Stay close enough to the systems to know when theory diverges from reality

Youll build a lean team around you but this role succeeds through clarity and leverage not headcount.

What You Bring

Youve likely done several of the following already:

Designed operated or materially evolved large GPU training clusters where you were accountable for throughput failure modes and iteration speed not just budget or vendor relationships
Built or scaled training infrastructure used by advanced ML teams in production
Worked with data pipelines measured in petabytes not terabytes
Balanced speed cost and reliability under real delivery pressure
Made architectural decisions that held up months or years later

Youve been close enough to the system to know where theoretical efficiency breaks down in practice and youve adjusted accordingly.

We care less about where you worked and more about what you personally owned especially when things didnt go to plan.

Why This Is Different

Fewer layers between decision and execution
Direct influence on model capability not just infrastructure metrics
A system still early enough to bend but serious enough to matter
A founder-led environment where clarity speed and judgment outweigh process
Meaningful equity and real ownership not cosmetic leadership

If youve ever wanted to apply everything youve learned without the inertia of a massive organization this is that moment.

What You Can Expect from Us

Become part of an AI revolution!
30 days of paid vacation
Access to a variety of fitness & wellness offerings via Wellhub
Mental health support through
JobRad Bike Lease
Substantially subsidized company pension plan for your future security
Subsidized Germany-wide transportation ticket
Budget for additional technical equipment
Flexible working hours for better work-life balance and hybrid working model
Virtual Stock Option Plan

Required Experience:

Exec

Key Skills

React Native
AI
Enterprise Software
React
Node.js
Redis
AWS
Software Development
IOS
Team Management
Product Development
Mobile Applications

Apply Now

About Company

Aleph Alpha

Pioneering sovereign, European AI technology to transform human-machine interaction that can find solutions for the challenges of tomorrow.

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

VP AI Engineering — Pre-Training

Berlin - Germany

Job Summary

About the Role

What You Will Lead

What You Bring

Why This Is Different

What You Can Expect from Us

About the Role

What You Will Lead

What You Bring

Why This Is Different

What You Can Expect from Us

Key Skills

About Company

Related Jobs