Senior Data Scientist (L2-Systems Software Engineer)

Trigent Software Private Limited

Job Location:

Chennai - India

Monthly Salary: Not Disclosed

Posted on: 1 hour ago

Vacancies: 1 Vacancy

Job Summary

Specific contributions expected from the role:Infrastructure as Code (IaC); Inference Optimization: Develop and optimize high-throughput low-latency inference engines for LLMs (e.g. Llama 3 Mistral) using C and CUDA.
Performance Engineering: Profile and eliminate bottlenecks in the software stack-from Python-level orchestration down to GPU kernel execution.
Memory Management: Implement advanced memory techniques like KV Cache optimization PagedAttention and model quantization (INT8/FP8/AWQ) to maximize hardware utilization.
Distributed Systems: Architect and maintain distributed serving systems capable of handling multi-node multi-GPU inference using technologies like Ray vLLM or TGI.
Framework Integration: Build and maintain high-performance Python bindings (Pybind11) for C backends to expose system-level optimizations to the AI research team.
Tooling & Observability: Build custom profiling tools and dashboards to monitor TTFT (Time to First Token) throughput and hardware telemetry (SMI)
Proficiency in Large Models & Deep Neural Networks. Hands-on experience in working with large models & deep neural networks.
Expertise in LLMs with working knowledge of large language models (LLMs).
Extensive experience in System platform Architecture.
Experience in Development Preferable for memory/storage/ any embedded system.
In depth knowledge and extensive experience in dealing with Standardizations/Technical Papers/Patents.
Extensive experience with C/C and Python programming.

Specific contributions expected from the role:Infrastructure as Code (IaC); Inference Optimization: Develop and optimize high-throughput low-latency inference engines for LLMs (e.g. Llama 3 Mistral) using C and CUDA. Performance Engineering: Profile and eliminate bottlenecks in the software stack-f...