Why us
We believe that AI has the potential to revolutionize how cancer and other complex diseases are diagnosed and treated. We also believe that AI is a tool not an identity without access to high quality data and a scientifically rigorous transparent approach to model development AI is just a buzzword. Thats where we come in.
Aignostics is a spin-off from one of Europes largest and most prestigious university hospitals (Charité) with employees in Berlin and New York. We have received over $50M in funding from leading investors and are a growing team of over 100 interdisciplinary professionals. We work with academic partners as well as leading global life sciences companies.
As a ML Engineering Team Lead at Aignostics you will lead a high-performing team focused on building large-scale distributed training infrastructure and workflows using cutting-edge technologies for digital pathology powering our state-of-the-art Foundational Model development. This is a hands-on leadership role where youll spend approximately 50% of your time on technical contributions while guiding your team to push the boundaries of machine learning for cancer research and diagnostics. Youll own the full employee lifecycle for your team drive technical roadmapping and ensure operational excellence while fostering a culture of autonomy and innovation.
At Aignostics we believe that fighting cancer is a job for people of all identities backgrounds and cultures. We value and celebrate diversity and inclusion and are committed to offering equal employment and promotion opportunities for all applicants and employees. Applicants will be considered regardless of their age disability ethnicity race gender identity or expression sexual orientation religion etc. We thrive through collaboration and believe the more inclusive we are the better our work will be.
Where your expertise is needed
As a hands-on team lead you would operate along the following dimensions:
People & Team Leadership
Build and scale a high-performing team capable of tackling complex distributed ML challenges
Own the full employee lifecycle: recruiting onboarding performance management career development and retention
Empower your team members and help them grow in autonomy and technical expertise
Mentor engineers at all levels fostering a culture of continuous learning and psychological safety
Create an inclusive environment where diverse perspectives drive innovation
Strategic & Operational Management
Define and execute technical roadmaps aligned with company objectives and product needs
Lead resource allocation and capacity planning to balance team workload and business priorities
Own FinOps responsibilities: optimize cloud costs track spending and ensure efficient resource utilization
Ensure operational readiness through monitoring incident response protocols and system reliability practices
Establish and track KPIs for team performance system efficiency and health
Technical Leadership
Design develop and maintain robust large-scale distributed training pipelines and ML infrastructure using cutting-edge technologies
Lead architecture decisions for distributed systems that enable efficient model development at scale
Hands-on contribution to critical technical challenges including optimization of training pipelines and infrastructure
Drive technical excellence through code reviews and architectural guidance
Stay at the forefront of distributed training technologies and bring innovation to the team
Cross-functional Collaboration
Partner closely with Product teams to translate business requirements into technical solutions
Collaborate with (senior) Research Scientists to enable scalable model development and experimentation
Work with Platform Engineering to ensure robust infrastructure and tooling
Build strong relationships across engineering teams to drive alignment and knowledge sharing
Communicate technical concepts effectively to both technical and non-technical stakeholders
What we are looking for
Required Skills
Bachelors or Masters degree in Computer Science Engineering Mathematics or a related field.
6 years of software engineering or ML engineering experience with at least 2 years in a technical leadership or team lead role
Proven track record of building and leading high-performing engineering teams. Experience guiding projects across the whole Software Development Life Cycle from requirements through design to implementation deployment and maintenance.
Deep understanding of fundamental Machine Learning concepts and principles familiarity with advanced model optimization techniques (such as distillation graph optimization quantization etc.)
Significant experience with large-scale distributed training systems and frameworks (especially PyTorch and NCCL). Familiarity with GPUs distributed systems parallel computing and scaling laws.
Advanced programming skills in Python experience in performance-critical languages (C/C or CUDA) being a plus
Familiarity of MLOps/DevOps best practices including CI/CD Docker Kubernetes and observability cloud platforms (GCP AWS or Azure) and infrastructure-as-code
Experience with Linux version control and container technologies
Demonstrated ability in resource allocation capacity planning and FinOps principles
Excellent problem-solving and data-driven decision-making skills in ambiguous situations
Leadership & Soft Skills
Effective communication and stakeholder management skills
Ability to give constructive feedback and navigate difficult conversations
Proven people leadership skills with experience managing the full employee lifecycle
Strategic thinking with ability to balance short-term execution and long-term vision
Experience with agile methodologies and iterative development processes
Proven ability to influence without authority and build consensus across teams
Track record of empowering team members and fostering autonomy
Ideally you also have
Experience with production systems in a regulated or healthcare environments familiarity with medical device standards (ISO 13485)
Experience working with biomedical or image data
Hands-on experience with Google Kubernetes Engine SLURM and Ray distributed computing framework
Experience with advanced ML stack (TorchDyno JAX TensorRT)
Familiarity with Information Security standards (ISO 27001) in software development
Experience with FinOps tools and cloud cost optimization strategies
Demonstrated experience with leveraging LLM/Agentic systems to accelerate development
We are still keen to hear from you if you dont match all the above points! Our needs are diverse and growing and you are encouraged to apply if you have any combination of these skills. The recruitment process is a comparative exercise and decisions will be made based on the applications we review at each time.
Our offer
Join a purpose-driven startup: We are working collectively to fight cancer and improve patient outcomes. Come help us make a difference!
Cutting-edge AI research and development with involvement of Charité TU Berlin and our other partners
Work with a welcoming diverse and highly international team of colleagues
Opportunity to shape the technical direction and grow into broader leadership roles
Expand your skills by benefitting from our Learning & Development yearly budget of 1000 (plus 2 L&D days) language classes and internal development programs
Access to leadership development programs and executive coaching
Flexible working hours and teleworking policy
Enjoy your well-deserved time off within our 30 paid vacation days per year
We are family & pet friendly and support flexible parental leave options
Pick a subsidized membership of your choice among public transport sports and well-being
Enjoy our social gatherings lunches and off-site events for a fun and inclusive work environment
Optional company pension scheme
Join us to make a difference!
We are an international, interdisciplinary team that is powering the next generation of precision medicine and advancing the fields of AI and pathology.