Senior AI Software Engineer – Model Training (fmd)
Heidelberg - Germany
Job Summary
Our Mission
Aleph Alpha is one of the few companies in Europe doing serious foundation model pre- and post-training. Were building models that have general-purpose capabilities and specifically excel at addressing the needs of our customers.
Were looking for exceptional Software Engineers to join our model training team. Most of the team is based in Heidelberg .
Team Culture
At Aleph Alpha we foster a culture built on ownership autonomy and empowerment. Teams and individual contributors are trusted to take responsibility for their work and drive meaningful impact. We maintain a flat organizational structure with efficient supportive management that enables quick decision-making open communication and a strong sense of shared purpose.
We believe a strong engineering culture is the key to model training success. We like Extreme Programming and favor trunk-based development. We often mob-program which keeps us aligned and means we always learn from each other.
About the Role
As a Software Engineer in Model Training youll work across our full stack. Some weeks you might be optimizing how training loads are scheduled on our cluster and making the pipeline more robust and performant so we can iterate faster. Other weeks youll be enabling large-scale code execution for reinforcement learning. And at other times you might dig deep into our evaluation codebase to lift inference throughput on evals.
No two days are the same. Things move fast and your ability to focus and prioritize is what lets you unblock the team day-to-day while designing the high-quality tooling and infrastructure that speeds us up long-term.
Were still building out our training pipeline and infrastructure. Some pieces exist some dont and youll have real influence on what gets built and how. Your work directly shapes how quickly we can experiment and improve our models.
Your responsibilities
Co-own the training pipeline end-to-end. Design build and maintain the infrastructure and components that let us iterate fast on experiments.
Build high-quality tooling. Model training is a continuous effort and we deliberately invest in our tooling and infrastructure to stay successful long term.
Collaborate across disciplines. We believe in cross-functional teams. Engineers and researchers work closely so we can learn from each other and iterate faster together.
Champion good engineering practices. Working incrementally maintaining fast feedback loops and refactoring continuously keep a team successful long-term especially when moving fast.
Shape the direction of the team. Our culture empowers individuals to take ownership. If you see that well need more GPUs a different storage system or a change to how the team is set up you should drive this change.
Your profile
In the model training team we hire slowly and deliberately. We recognise that we need top talent to deliver the best models and we value ability over experience: if you think you would be a good fit for this role and training an LLM in Europe excites you we want to hear from you.
Requirements
A track record of taking initiative to deliver high-impact work.
Experience contributing in high-performing teams.
Degree in computer science engineering or a related field.
Willingness to relocate to Germany. Our primary working locations are Heidelberg (preferred) and Berlin although there is some flexibility to work from other locations in Germany with regular travel to Heidelberg (potentially weekly).
Ability to write software that other strong engineers want to read and build on.
Desire to take ownership of problems and collaborate with other teams to solve them.
Deep interest in how state-of-the-art foundation models work.
Strong communication skills with the ability to convey technical solutions to diverse audiences.
Nice-to-haves
Experience working with distributed systems.
Experience working with Kubernetes.
We do not require prior experience in machine learning for this role but we do value your eagerness to learn. If you have prior experience in ML we will be particularly excited about:
Experience bringing AI research innovations into production.
Experience in areas such as large-scale data processing or distributed computation for foundation model training or inference.
Experience with performance engineering: profiling benchmarking and optimizing code for throughput latency or memory.
Compensation and benefits
Become part of an AI revolution!
30 days of paid vacation
Access to a variety of fitness & wellness offerings via Wellhub
Substantially subsidized company pension plan for your future security
Subsidized Germany-wide transportation ticket
Budget for additional technical equipment
Flexible working hours for better work-life balance and hybrid working model
Virtual Stock Option Plan
Required Experience:
Senior IC
About Company
Pioneering sovereign, European AI technology to transform human-machine interaction that can find solutions for the challenges of tomorrow.