Sr Data Scientist-Innovation lab

Hyderabad - India

Monthly Salary: Not Disclosed

Experience Required: 6years

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Sr Data Scientist

Job Responsibilities:

LLM Architecture: Good understanding of the architecture underlying large language models such as Transformer-based models and their variants. Design and implement deep learning model architectures using PyTorch.
Language Model Training and Fine-Tuning: Experience in training large-scale language models from scratch as well as fine-tuning pre-trained models on domain data.
Data Preprocessing for NLP: Skilled in preprocessing textual data including tokenization stemming lemmatization and handling of different text encoding.
Transfer Learning and Adaptation: Proficiency in applying transfer learning techniques to adapt existing LLMs to new languages domains or specific business needs.
Data Annotation and Evaluation: Skills in designing and implementing data annotation strategies for training LLMs and evaluating their performance using appropriate metrics.
Scalability and Deployment: Experience in scaling LLMs for production environments ensuring efficiency and robustness in deployment.
Model Training Optimization and Evaluation: Evaluate the performance of PyTorch models using appropriate metrics and techniques like cross-validation holdout sets or online evaluation. This encompasses the complete cycle of training fine-tuning and validating language models. You will be designing and adapting LLMs for use in virtual assistants Information retrieval and extraction etc.
Experimentation with Emerging Technologies and Methods: Actively exploring new technologies and methodologies in language model development including experimental frameworks and software tools.
LLM Alignment: Understanding of algorithms like DPO PPO KPO RLHF and using it for guardrails.
AI Data Retrieval: Data retrieval from unstructured data extract key value pairs using techniques like donut layoutLM table transformers.
Analyze data and build EDAs to identify data patterns Hands-on and strong understanding of concepts in Deep Learning and NLP Proficient in TensorFlow and similar libraries.

Required Qualifications

5 years of hands-on experience in developing and deploying Large Language Models and Machine learning and working with Pytorch.
A thorough understanding of machine learning particularly deep learning techniques including knowledge of neural network architectures training methods and optimization algorithms.
Proficiency in AI technology Python including experience with NLP libraries (e.g. Hugging Face Transformers NLTK spaCy) text classification.
Experience with frameworks: PyTorch or Tensorflow.
Experience with cloud services (AWS Azure) and ML deployment tool Docker
Familiarity with model fine-tuning and optimization techniques for LLMs.
Proven track record of innovative solutions in the field of LLMs.
Strong communication skills with the ability to explain complex AI concepts to non-expert audiences.

Additional good to have qualifications:

4 years experience in data analytics data science quantitative analysis using statistical computer languages to draw insights from large data sets 3 years experience in Python development preferably delivering production code for data applications.
Experience with unstructured data or computer vision models is a plus.
Experience with SQL is a big plus Extensive model implementation experience using Scikit.
Experience designing and developing for security critical applications; experience with the specifics for HIPAA/PHI/PII/GDPR a big plus.
Basic experience with Linux Git Jupyter Notebooks is must Knowledge of Agile development practices Flexibility and adaptability to respond to a rapidly changing environment.
Experience with distributed computational techniques and job orchestration tools and platforms is very valuable: airflow etc.

DataScience LLM GenerativeAI NLP PyTorch TensorFlow MachineLearning DeepLearning ArtificialIntelligence HuggingFace RLHF AIAlignment CloudAI

Education

Mtech

Sr Data Scientist Job Responsibilities: LLM Architecture: Good understanding of the architecture underlying large language models such as Transformer-based models and their variants. Design and implement deep learning model architectures using PyTorch.Language Model Training and Fine-Tuning: Experie...

Sr Data Scientist

Job Responsibilities:

LLM Architecture: Good understanding of the architecture underlying large language models such as Transformer-based models and their variants. Design and implement deep learning model architectures using PyTorch.
Language Model Training and Fine-Tuning: Experience in training large-scale language models from scratch as well as fine-tuning pre-trained models on domain data.
Data Preprocessing for NLP: Skilled in preprocessing textual data including tokenization stemming lemmatization and handling of different text encoding.
Transfer Learning and Adaptation: Proficiency in applying transfer learning techniques to adapt existing LLMs to new languages domains or specific business needs.
Data Annotation and Evaluation: Skills in designing and implementing data annotation strategies for training LLMs and evaluating their performance using appropriate metrics.
Scalability and Deployment: Experience in scaling LLMs for production environments ensuring efficiency and robustness in deployment.
Model Training Optimization and Evaluation: Evaluate the performance of PyTorch models using appropriate metrics and techniques like cross-validation holdout sets or online evaluation. This encompasses the complete cycle of training fine-tuning and validating language models. You will be designing and adapting LLMs for use in virtual assistants Information retrieval and extraction etc.
Experimentation with Emerging Technologies and Methods: Actively exploring new technologies and methodologies in language model development including experimental frameworks and software tools.
LLM Alignment: Understanding of algorithms like DPO PPO KPO RLHF and using it for guardrails.
AI Data Retrieval: Data retrieval from unstructured data extract key value pairs using techniques like donut layoutLM table transformers.
Analyze data and build EDAs to identify data patterns Hands-on and strong understanding of concepts in Deep Learning and NLP Proficient in TensorFlow and similar libraries.

Required Qualifications

5 years of hands-on experience in developing and deploying Large Language Models and Machine learning and working with Pytorch.
A thorough understanding of machine learning particularly deep learning techniques including knowledge of neural network architectures training methods and optimization algorithms.
Proficiency in AI technology Python including experience with NLP libraries (e.g. Hugging Face Transformers NLTK spaCy) text classification.
Experience with frameworks: PyTorch or Tensorflow.
Experience with cloud services (AWS Azure) and ML deployment tool Docker
Familiarity with model fine-tuning and optimization techniques for LLMs.
Proven track record of innovative solutions in the field of LLMs.
Strong communication skills with the ability to explain complex AI concepts to non-expert audiences.

Additional good to have qualifications:

4 years experience in data analytics data science quantitative analysis using statistical computer languages to draw insights from large data sets 3 years experience in Python development preferably delivering production code for data applications.
Experience with unstructured data or computer vision models is a plus.
Experience with SQL is a big plus Extensive model implementation experience using Scikit.
Experience designing and developing for security critical applications; experience with the specifics for HIPAA/PHI/PII/GDPR a big plus.
Basic experience with Linux Git Jupyter Notebooks is must Knowledge of Agile development practices Flexibility and adaptability to respond to a rapidly changing environment.
Experience with distributed computational techniques and job orchestration tools and platforms is very valuable: airflow etc.