VP AI Engineering — Pre-Training

Aleph Alpha

Not Interested
Bookmark
Report This Job

profile Job Location:

Berlin - Germany

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

About the Role

Youve likely scaled systems most people only read about.

Were entering a phase where infrastructure decisions directly shape model capability not just cost or reliability. The next gains wont come from brute force alone but from better systems sharper tradeoffs and sound judgment under real constraints.

As VP AI Engineering (Pre-Training) youll define how large-scale training actually works here from compute strategy and data movement to iteration speed. This role is for someone who has been close to the metal seen what breaks at scale and knows how to design systems that hold up under pressure.

Infrastructure in this role isnt a support function. Its a first-order driver of model outcomes. Youll be accountable for whether training runs succeed stall or fail and for the decisions behind those outcomes.

This is a role for builders who have owned training systems end-to-end not for abstract platform or governance leadership.

What You Will Lead

You will own the entire pre-training foundation including:

  • Large-scale compute strategy (GPU clusters in the thousands)

  • Training orchestration throughput and efficiency

  • PB-scale data pipelines storage and data movement

  • Reliability and performance of long-running training workloads

  • Infrastructure decisions that directly impact model velocity and quality

This is not a keep the lights on role. Youll be expected to:

  • Make hard architectural calls with incomplete information

  • Decide where to invest where to simplify and where to say no

  • Work closely with modeling and post-training leaders to remove system-level constraints

  • Stay close enough to the systems to know when theory diverges from reality

Many of the decisions youll make here are difficult to reverse. Youll commit to architectures tooling and operating models that must hold up in production training runs measured in weeks not just in design documents.

Youll build a lean team around you but this role succeeds through clarity and leverage not headcount.

What You Bring

Youve likely done several of the following already:

  • Designed operated or materially evolved large GPU training clusters where you were accountable for throughput failure modes and iteration speed not just budget or vendor relationships

  • Built or scaled training infrastructure used by advanced ML teams in production

  • Worked with data pipelines measured in petabytes not terabytes

  • Balanced speed cost and reliability under real delivery pressure

  • Made architectural decisions that held up months or years later

Youve been close enough to the system to know where theoretical efficiency breaks down in practice and youve adjusted accordingly.

We care less about where you worked and more about what you personally owned especially when things didnt go to plan.

Why This Is Different

  • Fewer layers between decision and execution

  • Direct influence on model capability not just infrastructure metrics

  • A system still early enough to bend but serious enough to matter

  • A founder-led environment where clarity speed and judgment outweigh process

  • Meaningful equity and real ownership not cosmetic leadership

If youve ever wanted to apply everything youve learned without the inertia of a massive organization this is that moment.

What You Can Expect from Us

  • Become part of an AI revolution!

  • 30 days of paid vacation

  • Access to a variety of fitness & wellness offerings via Wellhub

  • Mental health support through

  • JobRad Bike Lease

  • Substantially subsidized company pension plan for your future security

  • Subsidized Germany-wide transportation ticket

  • Budget for additional technical equipment

  • Flexible working hours for better work-life balance and hybrid working model

  • Virtual Stock Option Plan


Required Experience:

Exec

About the RoleYouve likely scaled systems most people only read about.Were entering a phase where infrastructure decisions directly shape model capability not just cost or reliability. The next gains wont come from brute force alone but from better systems sharper tradeoffs and sound judgment under...
View more view more

Key Skills

  • React Native
  • AI
  • Enterprise Software
  • React
  • Node.js
  • Redis
  • AWS
  • Software Development
  • IOS
  • Team Management
  • Product Development
  • Mobile Applications

About Company

Company Logo

Pioneering sovereign, European AI technology to transform human-machine interaction that can find solutions for the challenges of tomorrow.

View Profile View Profile