About Contextual AI
Were revolutionizing how AI Agents work by solving AIs most critical challenge: context. The right context at the right time unlocks the accuracy and production scale that enterprises leveraging AI require. Our enterprise AI development platform sits at the intersection of breakthrough AI research and practical developer needs. Our end-to-end platform allows AI developers to easily and accurately ingest and query documents from enterprise data sources and easily embed retrieval results into their business workflows.
Contextual AI was founded by the pioneers of Retrieval-Augmented Generation (RAG) the foundational technique behind the context layer connecting foundation models to current and relevant by the industrys most forward-thinking venture capitalists were not just participating in the enterprise AI revolution were defining it. Join us in building a future where AI doesnt just answer questions it transforms businesses.
About the role
As a a Member of Technical Staff specializing in Research Engineer LLM Systems & Performance you will be part of a small high-impact team building and optimizing LLM systems end-to-end from Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) pipelines to high-throughput Inference clusters in will collaborate closely with researchers and engineers to develop advanced models and infrastructure for the context layer.
What youll do
- Implement and improve components of our SFT and RL training pipelines (e.g. Verl SkyRL) including data loading training loops logging and evaluation.
- Contribute to LLM inference infrastructure (e.g. vLLM SGLang) including batching KV-cache management scheduling and serving optimizations.
- Profile and optimize end-to-end performance (throughput latency compute/memory/bandwidth) using tools like Nsight and profilers to identify and fix bottlenecks.
- Work with distributed training and inference setups using NCCL NVLink and data/tensor/pipeline/expert/context parallelism on multi-GPU clusters.
- Help experiment with and productionize quantization (e.g. INT8 FP8 FP4 mixed-precision) for both training and inference.
- Write and optimize GPU kernels using tools like CUDA or Triton and leverage techniques such as FlashAttention and Tensor Cores where appropriate.
- Collaborate with researchers to take ideas from paper prototype scaled experiments production.
- Write clean well-tested and well-documented code that can be shared across multiple teams (Research Platform and Products).
What were seeking
- Bachelors or Masters degree in Computer Science Electrical Engineering or a related technical field (or equivalent practical experience).
- Strong programming skills in Python.
- Experience with at least one major ML framework: PyTorch or JAX.
- Solid understanding of GPU computing fundamentals (threads/warps/blocks memory hierarchy bandwidth vs compute etc.).
- Familiarity with distributed training or inference concepts (e.g. model parallelism collective communication disaggregated serving KV caching).
- Interest in performance engineering: profiling kernel fusion memory layout and end-to-end system efficiency.
- Ability to work in a fast-paced environment communicate clearly and collaborate closely with other engineers and researchers.
Location:Mountain View CA.
Salary Range for California Based Applicants: $170000 - $200000 equity benefits (actual compensation will be determined based on experience location and other factors permitted by law).
Equal Opportunity
Contextual AI is an equal opportunity employer and complies with all applicable federal state and local fair employment practices laws. All qualified applicants will receive consideration for employment without regard to race color religion national origin ancestry sex sexual orientation gender gender expression gender identity genetic information or characteristics physical or mental disability marital/domestic partner status age military/veteran status medical condition or any other characteristic protected by law.
Required Experience:
Staff IC
About Contextual AIWere revolutionizing how AI Agents work by solving AIs most critical challenge: context. The right context at the right time unlocks the accuracy and production scale that enterprises leveraging AI require. Our enterprise AI development platform sits at the intersection of breakth...
About Contextual AI
Were revolutionizing how AI Agents work by solving AIs most critical challenge: context. The right context at the right time unlocks the accuracy and production scale that enterprises leveraging AI require. Our enterprise AI development platform sits at the intersection of breakthrough AI research and practical developer needs. Our end-to-end platform allows AI developers to easily and accurately ingest and query documents from enterprise data sources and easily embed retrieval results into their business workflows.
Contextual AI was founded by the pioneers of Retrieval-Augmented Generation (RAG) the foundational technique behind the context layer connecting foundation models to current and relevant by the industrys most forward-thinking venture capitalists were not just participating in the enterprise AI revolution were defining it. Join us in building a future where AI doesnt just answer questions it transforms businesses.
About the role
As a a Member of Technical Staff specializing in Research Engineer LLM Systems & Performance you will be part of a small high-impact team building and optimizing LLM systems end-to-end from Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) pipelines to high-throughput Inference clusters in will collaborate closely with researchers and engineers to develop advanced models and infrastructure for the context layer.
What youll do
- Implement and improve components of our SFT and RL training pipelines (e.g. Verl SkyRL) including data loading training loops logging and evaluation.
- Contribute to LLM inference infrastructure (e.g. vLLM SGLang) including batching KV-cache management scheduling and serving optimizations.
- Profile and optimize end-to-end performance (throughput latency compute/memory/bandwidth) using tools like Nsight and profilers to identify and fix bottlenecks.
- Work with distributed training and inference setups using NCCL NVLink and data/tensor/pipeline/expert/context parallelism on multi-GPU clusters.
- Help experiment with and productionize quantization (e.g. INT8 FP8 FP4 mixed-precision) for both training and inference.
- Write and optimize GPU kernels using tools like CUDA or Triton and leverage techniques such as FlashAttention and Tensor Cores where appropriate.
- Collaborate with researchers to take ideas from paper prototype scaled experiments production.
- Write clean well-tested and well-documented code that can be shared across multiple teams (Research Platform and Products).
What were seeking
- Bachelors or Masters degree in Computer Science Electrical Engineering or a related technical field (or equivalent practical experience).
- Strong programming skills in Python.
- Experience with at least one major ML framework: PyTorch or JAX.
- Solid understanding of GPU computing fundamentals (threads/warps/blocks memory hierarchy bandwidth vs compute etc.).
- Familiarity with distributed training or inference concepts (e.g. model parallelism collective communication disaggregated serving KV caching).
- Interest in performance engineering: profiling kernel fusion memory layout and end-to-end system efficiency.
- Ability to work in a fast-paced environment communicate clearly and collaborate closely with other engineers and researchers.
Location:Mountain View CA.
Salary Range for California Based Applicants: $170000 - $200000 equity benefits (actual compensation will be determined based on experience location and other factors permitted by law).
Equal Opportunity
Contextual AI is an equal opportunity employer and complies with all applicable federal state and local fair employment practices laws. All qualified applicants will receive consideration for employment without regard to race color religion national origin ancestry sex sexual orientation gender gender expression gender identity genetic information or characteristics physical or mental disability marital/domestic partner status age military/veteran status medical condition or any other characteristic protected by law.
Required Experience:
Staff IC
View more
View less