Lead AI Platform

القاهرة - مصر

الراتب شهرياً: لم يكشف

تاريخ النشر: نُشرت قبل 8 يوم

عدد الوظائف الشاغرة: 1 عدد الوظائف الشاغرة

سجل للتقديم

ملخص الوظيفة

Description

Integrant is looking for game changers to join our team as Lead AI Platform.

The Lead AI Platform Engineer is responsible for bridging AI workloads with production-grade infrastructure with a strong focus on NVIDIA AI stack enabling high-performance scalable and optimized AI systems.

This role focuses on model optimization runtime efficiency and GPU utilization ensuring that AI workloads are production-ready cost-efficient and performant across enterprise environments.

Roles and Responsibilities:

Translate AI/ML workloads into optimized infrastructure and deployment strategies

Optimize model performance across GPU environments (latency throughput memory utilization)
Design and implement inference and training pipelines using NVIDIA stack tools (TensorRT Triton NIM)
Convert and optimize models across frameworks (PyTorch ONNX TensorRT)
Analyze and resolve performance bottlenecks using profiling tools (GPU memory network)
Improve GPU utilization and scheduling efficiency across clusters
Design scalable distributed training and inference architectures
Work closely with customers to define AI infrastructure strategies and deployment models
Support production deployments including monitoring rollback and performance validation
Conduct applied research to improve model efficiency and infrastructure utilization
Mentor team members on AI infrastructure optimization and GPU systems
Experiment tracking tools (MLflow W&B Neptune) log parameters metrics and artifacts for comparison
Find the Model degradation happens post-deployment: concept drift data pipeline changes traffic pattern shifts
Root cause analysis (RCA) applies to ML systems: isolating variables reproducing issues

Requirements

8 years of experience in AI systems
8 years of experience in ML systems HPC and AI infrastructure
Strong proficiency in Python
Strong experience with GPU-based AI workloads and performance optimization
Deep understanding of model optimization techniques (quantization pruning batching)
Hands-on experience with:

PyTorch
ONNX / ONNX Runtime
TensorRT / TensorRT-LLM
Triton Inference Server

Knowledge of CUDA cuDNN and GPU architecture fundamentals
Experience with distributed systems (multi-GPU / multi-node)
Familiarity with:

NCCL communication
NVLink / InfiniBand
Kubernetes or Slurm for orchestration

Experience deploying AI models into production environments
Ability to analyze system bottlenecks (compute memory network)
Experience with profiling tools (Nsight TensorRT profiler etc.)
Knowledge of cost optimization strategies for GPU workloads
Experiment tracking tools (MLflow W&B Neptune) log parameters metrics and artifacts for comparison
Find the Model degradation happens post-deployment: concept drift data pipeline changes traffic pattern shifts
Root cause analysis (RCA) applies to ML systems: isolating variables reproducing issues

Nice to Have

Experience with NVIDIA NIM and NGC ecosystem
Exposure to Megatron-LM NeMo or large-scale LLM training/inference
Experience with LLM optimization techniques (KV cache batching strategies)
Familiarity with MLOps practices and CI/CD for AI systems
Experience in customer-facing architecture or consulting roles
Familiarity with hybrid cloud / on-prem HPC environments

Benefits

Salary paid in USD
Six-month career advancing opportunities
Supportive and friendly work environment
Premium medical insurance employee family
English language development courses
Interest-free loans paid over 2.5 years
Technical development courses
Planned overtime program (POP)
Employment referral program
Premium location in Maadi
Social insurance

DescriptionIntegrant is looking for game changers to join our team as Lead AI Platform.The Lead AI Platform Engineer is responsible for bridging AI workloads with production-grade infrastructure with a strong focus on NVIDIA AI stack enabling high-performance scalable and optimized AI systems.This ...

Description

Integrant is looking for game changers to join our team as Lead AI Platform.

This role focuses on model optimization runtime efficiency and GPU utilization ensuring that AI workloads are production-ready cost-efficient and performant across enterprise environments.

Roles and Responsibilities:

Translate AI/ML workloads into optimized infrastructure and deployment strategies

Optimize model performance across GPU environments (latency throughput memory utilization)
Design and implement inference and training pipelines using NVIDIA stack tools (TensorRT Triton NIM)
Convert and optimize models across frameworks (PyTorch ONNX TensorRT)
Analyze and resolve performance bottlenecks using profiling tools (GPU memory network)
Improve GPU utilization and scheduling efficiency across clusters
Design scalable distributed training and inference architectures
Work closely with customers to define AI infrastructure strategies and deployment models
Support production deployments including monitoring rollback and performance validation
Conduct applied research to improve model efficiency and infrastructure utilization
Mentor team members on AI infrastructure optimization and GPU systems
Experiment tracking tools (MLflow W&B Neptune) log parameters metrics and artifacts for comparison
Find the Model degradation happens post-deployment: concept drift data pipeline changes traffic pattern shifts
Root cause analysis (RCA) applies to ML systems: isolating variables reproducing issues

Requirements

8 years of experience in AI systems
8 years of experience in ML systems HPC and AI infrastructure
Strong proficiency in Python
Strong experience with GPU-based AI workloads and performance optimization
Deep understanding of model optimization techniques (quantization pruning batching)
Hands-on experience with:

PyTorch
ONNX / ONNX Runtime
TensorRT / TensorRT-LLM
Triton Inference Server

Knowledge of CUDA cuDNN and GPU architecture fundamentals
Experience with distributed systems (multi-GPU / multi-node)
Familiarity with:

NCCL communication
NVLink / InfiniBand
Kubernetes or Slurm for orchestration

Experience deploying AI models into production environments
Ability to analyze system bottlenecks (compute memory network)
Experience with profiling tools (Nsight TensorRT profiler etc.)
Knowledge of cost optimization strategies for GPU workloads
Experiment tracking tools (MLflow W&B Neptune) log parameters metrics and artifacts for comparison
Find the Model degradation happens post-deployment: concept drift data pipeline changes traffic pattern shifts
Root cause analysis (RCA) applies to ML systems: isolating variables reproducing issues

Nice to Have

Experience with NVIDIA NIM and NGC ecosystem
Exposure to Megatron-LM NeMo or large-scale LLM training/inference
Experience with LLM optimization techniques (KV cache batching strategies)
Familiarity with MLOps practices and CI/CD for AI systems
Experience in customer-facing architecture or consulting roles
Familiarity with hybrid cloud / on-prem HPC environments

Benefits

Salary paid in USD
Six-month career advancing opportunities
Supportive and friendly work environment
Premium medical insurance employee family
English language development courses
Interest-free loans paid over 2.5 years
Technical development courses
Planned overtime program (POP)
Employment referral program
Premium location in Maadi
Social insurance

اعرض المزيد

قدم الآن

عن الشركة

Integrant

Integrant, Inc. is a custom software development company focused on providing tailor made software solutions to fit your needs to a tee. We strive to uncover your pain points and identify how our team can seamlessly integrate with you and your business for a one-team approach.

عرض صفحة الشركة عرض صفحة الشركة

التقديم التلقائي على الوظائف بـ AI

قدّم على عشرات الوظائف بنقرة واحدة