Work along side Foundation Model Research team to optimize inference for cutting edge model closely with product teams to build Production grade solutions to launch models serving millions of customers in real tools to understand bottlenecks in Inference for different hardwares and use and guide engineers in the organization.
5 years of experience leading and driving complex ambiguous projects.
Have experience with high throughput services particularly at supercomputing scale.
Proficient with running applications on Cloud (AWS / Azure or equivalent) using Kubernetes Docker etc.
Familiar with GPU programming concepts using CUDA.
Familiar with one of the popular ML Frameworks like Pytorch Tensorflow.
Proficient in building and maintaining systems written in modern languages (eg: Golang python)
Familiar with fundamental Deep Learning architectures such as Transformers Encoder/Decoder models.
Familiarity with Nvidia TensorRT-LLM vLLLM DeepSpeed Nvidia Triton Server etc.
Experience writing custom CUDA kernels using CUDA or OpenAI Triton.
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.