Requirements:
- 4 years of experience as a fullstack or backend engineer
- Strong proficiency in Python and JavaScript/TypeScript
- Experience with FastAPI / Django / and React /
- Solid understanding of distributed systems and async architectures
- Hands-on experience deploying LLMs such as GPT-4/4.1 Claude LLaMA Mistral Mixtral
- Experience serving models using vLLM Triton TGI or similar frameworks
- Strong understanding of transformer models and inference trade-offs
- Experience with embeddings vector search and RAG architectures
- Experience with AWS GCP or Azure (GPU workloads preferred)
- Strong Docker and Kubernetes experience
- Familiarity with CI/CD pipelines for ML systems
- Experience with observability tools (Prometheus Grafana OpenTelemetry)
- Experience with multimodal AI (audio video image models)
- Experience optimizing LLM inference costs at scale
- Startup or high-growth environment experience
- Prior work on AI-first or AI-native products
Responsibilities:
- Deploy and optimize LLMs (open-source and commercial) for production use
- Implement inference optimization techniques (quantization batching caching distillation)
- Build and maintain RAG pipelines (embeddings vector databases retrieval strategies)
- Evaluate and improve model quality (latency accuracy hallucination reduction cost)
- Implement prompt management versioning and A/B testing
- Design and develop scalable APIs for AI-driven features
- Deploy and manage model-serving infrastructure (Docker Kubernetes GPUs)
- Optimize hardware utilization for inference workloads
- Implement monitoring logging and alerting for AI services
- Ensure security data privacy and compliance across AI pipelines
- Build internal tools and user-facing interfaces for AI workflows
- Integrate LLM services into web and mobile applications
- Work closely onsite with product managers designers and data teams
- Rapidly prototype test and iterate on AI-powered features
Requirements: 4 years of experience as a fullstack or backend engineerStrong proficiency in Python and JavaScript/TypeScriptExperience with FastAPI / Django / and React / Solid understanding of distributed systems and async architecturesHands-on experience deploying LLMs such as GPT-4/4.1 Claude LL...
Requirements:
- 4 years of experience as a fullstack or backend engineer
- Strong proficiency in Python and JavaScript/TypeScript
- Experience with FastAPI / Django / and React /
- Solid understanding of distributed systems and async architectures
- Hands-on experience deploying LLMs such as GPT-4/4.1 Claude LLaMA Mistral Mixtral
- Experience serving models using vLLM Triton TGI or similar frameworks
- Strong understanding of transformer models and inference trade-offs
- Experience with embeddings vector search and RAG architectures
- Experience with AWS GCP or Azure (GPU workloads preferred)
- Strong Docker and Kubernetes experience
- Familiarity with CI/CD pipelines for ML systems
- Experience with observability tools (Prometheus Grafana OpenTelemetry)
- Experience with multimodal AI (audio video image models)
- Experience optimizing LLM inference costs at scale
- Startup or high-growth environment experience
- Prior work on AI-first or AI-native products
Responsibilities:
- Deploy and optimize LLMs (open-source and commercial) for production use
- Implement inference optimization techniques (quantization batching caching distillation)
- Build and maintain RAG pipelines (embeddings vector databases retrieval strategies)
- Evaluate and improve model quality (latency accuracy hallucination reduction cost)
- Implement prompt management versioning and A/B testing
- Design and develop scalable APIs for AI-driven features
- Deploy and manage model-serving infrastructure (Docker Kubernetes GPUs)
- Optimize hardware utilization for inference workloads
- Implement monitoring logging and alerting for AI services
- Ensure security data privacy and compliance across AI pipelines
- Build internal tools and user-facing interfaces for AI workflows
- Integrate LLM services into web and mobile applications
- Work closely onsite with product managers designers and data teams
- Rapidly prototype test and iterate on AI-powered features
View more
View less