AI Engineer

Bangalore - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Job Specification: AI Platform Engineer
About the Role
We are seeking an AI Platform Engineer to build and scale the infrastructure that powers
our production AI services. You will take cutting-edge models-ranging from speech
recognition (ASR) to large language models (LLMs)-and deploy them into highly
available developer-friendly APIs.
You will be responsible for creating the bridge between the R&D team who train models
and the applications that consume them. This means developing robust APIs deploying
and optimizing models on Triton Inference Server (or similar frameworks) and ensuring
real-time scalable inference.

Responsibilities
API Development
Design build and maintain production-ready APIs for speech language and
other AI models.
Provide SDKs and documentation to enable easy developer adoption.
Model Deployment
Deploy models (ASR LLM and others) using Triton Inference Server or
similar systems.
Optimize inference pipelines for low-latency high-throughput workloads.
Scalability & Reliability
Architect infrastructure for handling large-scale concurrent inference
requests.
Implement monitoring logging and auto-scaling for deployed services.
Collaboration
Work with research teams to productionize new models.

Partner with application teams to deliver AI functionality seamlessly through
APIs.

DevOps & Infrastructure
Automate CI/CD pipelines for models and APIs.
Manage GPU-based infrastructure in cloud or hybrid environments.

Requirements
Core Skills
Strong programming experience in Python (FastAPI Flask) and/or
Go/ for API services.
Hands-on experience with model deployment using Triton Inference Server
TorchServe or similar.
Familiarity with both ASR frameworks and LLM frameworks (Hugging
Face Transformers TensorRT-LLM vLLM etc.).

Infrastructure
Experience with Docker Kubernetes and managing GPU-accelerated
workloads.
Deep knowledge of real-time inference systems (REST gRPC WebSockets
streaming).
Cloud experience (AWS GCP Azure).

Bonus
Experience with model optimization (quantization distillation TensorRT
ONNX).
Exposure to MLOps tools for deployment and monitoring