drjobs PhD Fellowship LLM Architecture Optimization fmd

PhD Fellowship LLM Architecture Optimization fmd

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Heidelberg - Germany

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Aleph Alpha Researchs mission is to deliver categorydefining AI innovation that enables open accessible and trustworthy deployment of GenAI in industrial applications. Our organization develops foundational models and nextgeneration methods that make it easy and affordable for Aleph Alphas customers to increase productivity in development engineering logistics and manufacturing processes.

We are looking to grow our academic partnership Lab1141 with TU Darmstadt and our GenAI group of PhD students supervised by Prof. Dr. Kersting. We are looking for an enthusiastic researcher at heart passionate to improve foundational multimodal NLP models and aiming to obtain a PhD degree in a threeyear program. On average you will spend half of your time at Aleph Alpha Research in Heidelberg and the other half at the Technical University of Darmstadt which is closeby to travel.

As a PhD fellow in Aleph Alpha Research you develop new approaches to improve the foundational model architecture and applications. You are given a unique research environment with sufficient amount of compute and both industrial and academic professional supervisors to conduct and publish your research.

Please formulate your dream research topic in your application letter that is aligned to fit the Foundation Models team.

While at Aleph Alpha Research for the LLM Architecture topic you will be working with our Foundational Models team in which you create powerful stateofthe art multimodal foundational models research and share novel approaches to pretraining finetuning and helpfulness and enable costefficient inference on a variety of accelerators.

Topic:

Introduction

Foundation models are central to many of the most innovative applications in deep learning predominantly utilize selfsupervised learning autoregressive generation and transformer architecture. However the learning paradigm and architecture come with several challenges. To address these limitations and improve both accuracy and efficiency in generation and downstream tasks it is essential to consider adjustments to its core paradigms. These include the sourcing and composition of training data design choices of the training itself and the underlying model architecture. Further extensions of the system such as RetrievalAugmented Generation (RAG) and changes to foundational components like tokenizers should be considered.

Related Work

The training data of LLMs is at the core of a models downstream capabilities. Consequently recent works focus on extracting highquality data from large corpora LLama3 Olmo1.7. Additionally the order and structure in which the data is presented to the model have a large influence on model performance as demonstrated by curriculum learning approaches Olmo1.7 Ormazabal et al. Mukherjee et al and more sophisticated data packing algorithms Staniszewski et al. Shi et al.

Similarly adjustments to the training procedures itself have shown promising results. For example Ibrahim et al. discuss infinite learning rate schedules that allow for more flexibility in adjusting training steps and facilitate continualpretraining tasks more easily.

Further the LLM architecture and its components leave room for improvement.Ainslie et al. introduce groupedquery attention (GQA) which increases the efficiency of the transformers attention component. Liu et al make changes to the rotary position embeddings to improve longcontext understanding.

Recently structured statespace sequence models (SSMs) Gu et al. Poli et al. and hybrid architectures have emerged as promising class of architectures for sequence modeling.

Lastly the model itself can be embedded in a larger system such as RAG. For example incontext learning via RAG enhances the generations accuracy and credibility Gao et al. particularly for knowledgeintensive tasks and allows for continuous knowledge updates and integration of domainspecific information.

Goals

This project aims to explore novel LLMsystem architectures data and training paradigms that could either replace or augment traditional autoregressive generation and transformer components as well as enhance auxiliary elements such as retrievers and tokenizers.

Your responsibilities:

Your profile:

Our tenets

We believe embodying these values would make you a great fit in our team:

What you can expect from us

Employment Type

Full-Time

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.