The increasing use of Natural Language to SQL (NL2SQL) techniques is transforming the way large language models (LLMs) help bridge the gap between complex industrial data and users enabling domain experts to interact with data using natural language. However challenges remain in optimizing and evaluating NL2SQL outputs particularly for interactive AI applications and specialized domains like semiconductor data visualization. This project aims to investigate and improve NL2SQL methods to support our Data Viz tool.
- During your thesis you will conduct a thorough analysis of the existing domain knowledge advanced NL2SQL methods and existing evaluation techniques. This includes reviewing methods such as finetuning retrievalaugmented generation (RAG) and researching how AI agents are applied in NL2SQL tasks and exploring potential approaches. You will explore existing advanced techniques and identify key areas for improvement.
- You will automate the process of collecting preparing and managing training and testing datasets by developing scripts and tools to automate the extraction and preprocessing of training and testing data. You will ensure data quality to represent domain knowledge across various scenarios.
- Furthermore you will implement methods to enhance NL2SQL performance and retrieval accuracy. You will design and implement multiple suitable methods to enhance NL2SQL performance and benchmark different methods.
- Moreover you will establish robust evaluation methods to measure the improvements in NL2SQL results. You will define metrics for evaluating NL2SQL performance (e.g. accuracy efficiency) and implement evaluation protocols to systematically assess improvements.
- Finally you will rigorously test and validate the effectiveness of the enhanced NL2SQL system comparing results against a baseline to measure improvement and compile comprehensive documentation of the entire process including methodologies codes processes and results. You will prepare a final report summarizing findings and potential future work.
Qualifications :
- Education: Master studies in the field of Computer Science Engineering Microelectronics or comparable
- Experience and Knowledge: good knowledge of Python; experience with SQL Data Science AI GenAI LLM microelectronics and web development is a plus
- Personality and Working Practice: you have a growth mindset
- Languages: good in English
Additional Information :
Start: according to prior agreement
Duration: 6 months
Requirement for this thesis is the enrollment at university. Please attach your CV transcript of records examination regulations and if indicated a valid work and residence permit.
Diversity and inclusion are not just trends for us but are firmly anchored in our corporate culture. Therefore we welcome all applications regardless of gender age disability religion ethnic origin or sexual identity.
Need further information about the job
Xueming Li (Functional Department)
#LIDNI
Remote Work :
No
Employment Type :
Fulltime