Senior Data Scientist (L2-Systems Software Engineer)

Not Interested
Bookmark
Report This Job

profile Job Location:

Chennai - India

profile Monthly Salary: Not Disclosed
Posted on: 1 hour ago
Vacancies: 1 Vacancy

Job Summary

  • Specific contributions expected from the role:Infrastructure as Code (IaC); Inference Optimization: Develop and optimize high-throughput low-latency inference engines for LLMs (e.g. Llama 3 Mistral) using C and CUDA.
  • Performance Engineering: Profile and eliminate bottlenecks in the software stack-from Python-level orchestration down to GPU kernel execution.
  • Memory Management: Implement advanced memory techniques like KV Cache optimization PagedAttention and model quantization (INT8/FP8/AWQ) to maximize hardware utilization.
  • Distributed Systems: Architect and maintain distributed serving systems capable of handling multi-node multi-GPU inference using technologies like Ray vLLM or TGI.
  • Framework Integration: Build and maintain high-performance Python bindings (Pybind11) for C backends to expose system-level optimizations to the AI research team.
  • Tooling & Observability: Build custom profiling tools and dashboards to monitor TTFT (Time to First Token) throughput and hardware telemetry (SMI)
  • Proficiency in Large Models & Deep Neural Networks. Hands-on experience in working with large models & deep neural networks.
  • Expertise in LLMs with working knowledge of large language models (LLMs).
  • Extensive experience in System platform Architecture.
  • Experience in Development Preferable for memory/storage/ any embedded system.
  • In depth knowledge and extensive experience in dealing with Standardizations/Technical Papers/Patents.
  • Extensive experience with C/C and Python programming.
Specific contributions expected from the role:Infrastructure as Code (IaC); Inference Optimization: Develop and optimize high-throughput low-latency inference engines for LLMs (e.g. Llama 3 Mistral) using C and CUDA. Performance Engineering: Profile and eliminate bottlenecks in the software stack-f...
View more view more