Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailNot Disclosed
Salary Not Disclosed
1 Vacancy
Aleph Alpha Researchs mission is to deliver categorydefining AI innovation that enables open accessible and trustworthy deployment of GenAI in industrial applications. Our organization develops foundational models and nextgeneration methods that make it easy and affordable for Aleph Alphas customers to increase productivity in development engineering logistics and manufacturing processes.
We are looking to grow our academic partnership Lab1141 with TU Darmstadt and our GenAI group of PhD students supervised by Prof. Dr. Kersting. We are looking for an enthusiastic researcher at heart passionate to improve foundational multimodal NLP models and aiming to obtain a PhD degree in a threeyear program. On average you will spend half of your time at Aleph Alpha Research in Heidelberg and the other half at the Technical University of Darmstadt which is closeby to travel.
As a PhD fellow in Aleph Alpha Research you develop new approaches to improve the foundational model architecture and applications. You are given a unique research environment with sufficient amount of compute and both industrial and academic professional supervisors to conduct and publish your research.
Please formulate your dream research topic in your application letter that is aligned to fit the Foundation Models team.
While at Aleph Alpha Research for the LLM Architecture topic you will be working with our Foundational Models team in which you create powerful stateofthe art multimodal foundational models research and share novel approaches to pretraining finetuning and helpfulness and enable costefficient inference on a variety of accelerators.
Introduction
Foundation models are central to many of the most innovative applications in deep learning predominantly utilize selfsupervised learning autoregressive generation and transformer architecture. However the learning paradigm and architecture come with several challenges. To address these limitations and improve both accuracy and efficiency in generation and downstream tasks it is essential to consider adjustments to its core paradigms. These include the sourcing and composition of training data design choices of the training itself and the underlying model architecture. Further extensions of the system such as RetrievalAugmented Generation (RAG) and changes to foundational components like tokenizers should be considered.
Related Work
The training data of LLMs is at the core of a models downstream capabilities. Consequently recent works focus on extracting highquality data from large corpora LLama3 Olmo1.7. Additionally the order and structure in which the data is presented to the model have a large influence on model performance as demonstrated by curriculum learning approaches Olmo1.7 Ormazabal et al. Mukherjee et al and more sophisticated data packing algorithms Staniszewski et al. Shi et al.
Similarly adjustments to the training procedures itself have shown promising results. For example Ibrahim et al. discuss infinite learning rate schedules that allow for more flexibility in adjusting training steps and facilitate continualpretraining tasks more easily.
Further the LLM architecture and its components leave room for improvement.Ainslie et al. introduce groupedquery attention (GQA) which increases the efficiency of the transformers attention component. Liu et al make changes to the rotary position embeddings to improve longcontext understanding.
Recently structured statespace sequence models (SSMs) Gu et al. Poli et al. and hybrid architectures have emerged as promising class of architectures for sequence modeling.
Lastly the model itself can be embedded in a larger system such as RAG. For example incontext learning via RAG enhances the generations accuracy and credibility Gao et al. particularly for knowledgeintensive tasks and allows for continuous knowledge updates and integration of domainspecific information.
Goals
This project aims to explore novel LLMsystem architectures data and training paradigms that could either replace or augment traditional autoregressive generation and transformer components as well as enhance auxiliary elements such as retrievers and tokenizers.
Research and development of novel approaches and algorithms that improve training inference interpretations or applications of foundational models
Analysis and benchmarking of stateofthe art as well as new approaches
Collaborating with scientists and engineers at Aleph Alpha and Aleph Alpha Research plus chosen external industrial and academic partners
In particular fruitful interactions with our group of GenAI PhD students and fostering exchange between Aleph Alpha Research and your university
Publishing own and collaborative work on machine learning venues and making code and models sourceavailable for use by the broader research community
Masters Degree in Computer Science Mathematics or similar
Solid understanding of DL/ML techniques algorithms and tools for training and inference
Experience and knowledge of Python and at least one common deeplearning framework preferably PyTorch
Ready to relocate to region Heidelberg/ Darmstadt Germany
Interest to bridge the gap between addressing practical industry challenges and contributing to academic research
Ambition to obtain a PhD in generative machine learning in a threeyear program
We believe embodying these values would make you a great fit in our team:
We own work endtoend from idea to production: You take responsibility for every stage of the process ensuring that our work is complete scalable and of the highest quality.
We ship what matters: Your focus is on solving real problems for our customers and the research community. You prioritize delivering impactful solutions that bring value and make a difference.
We work transparently: You collaborate and share your results openly with the team partners customers and the broader community through publishing and sharing results and insight including blogposts papers checkpoints and more.
We innovate through leveraging our intrinsic motivations and talents: We strive for technical depth and to balance ideas and interests of our team with our missionbackwards approach and leverage the interdisciplinary diverse perspectives in our teamwork.
Become part of an AI revolution!
30 days of paid vacation
Public transport subsidy
Fitness and wellness offerings (Wellhub)
Mental health platform nilo.health
Sharepartsofyourworkviapublicationsand sourceavailablecode
Flexible working hours and hybrid working model
Full-Time