Senior AI Infrastructure Engineer

IO TECH SOLUTIONS LIMITED

Posted on : 19-08-2025

Employer Active

1 Vacancy

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Send me jobs like this

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Job Location

Beijing - China

Monthly Salary

Not Disclosed

Salary Not Disclosed

Vacancy

1 Vacancy

Posted on : 19-08-2025

Job Description

Responsibilities

1. Full-Stack AI Infrastructure Architecture & Development:

Build a full-stack AI infrastructure system for quantitative scenarios based on Kubernetes unifying the management of heterogeneous computing resources (e.g. GPU pooling).
Integrate high-performance communication layers (e.g. RDMA) and drive the unified development of AI training/inference platforms and GPU operation/maintenance platforms.
Streamline the end-to-end workflow from resource scheduling to model deployment enhancing system efficiency and stability.

2. Intelligent Computing Power Scheduling System Design:

Design a global scheduling mechanism supporting multi-task types and priority strategies leveraging Volcano scheduler capabilities.
Lead the customization and maintenance of Volcano and core Operators optimizing elastic scaling and resource utilization based on dynamic demands of quantitative tasks.

3. Hardware-Software Co-Optimization & System Reliability:

Develop an intermediate layer bridging underlying hardware (GPU/networking/storage) and AI frameworks (PyTorch/TensorFlow).
Build GPU elastic resource pools fault self-healing mechanisms and unified observability platforms (e.g. monitoring dashboards).
Ensure high-efficiency iteration and high availability of large-scale model training through performance tuning and automated operations.

4. Technical Foresight & Architecture Evolution:

Drive long-term AI Infra roadmap planning anticipating quantitative business needs in computing scale training efficiency and cost control.
Explore and validate cutting-edge architectures (e.g. heterogeneous computing fusion compute-storage separation Serverless AI) to enhance infrastructure capabilities and technical barriers.

Qualifications

1. Bachelors/Masters in Computer Science or related fields 5-10 years of experience with strong self-motivation and execution ability to identify and resolve technical bottlenecks.

2. Deep expertise in AI infrastructure: Kubernetes GPU resource management RDMA/high-performance networking and large-scale distributed AI system design/deployment.

3. Proficient in *Golang/Python* with solid system programming and automation skills. Priority given to candidates with experience in *Volcano/Kueue schedulers K8s Operator development or open-source contributions*.

4. Familiar with core resource scheduling principles GPU lifecycle management (allocation isolation elasticity fault tolerance) and designing high-availability low-latency strategies for quantitative tasks.

5. Knowledge of mainstream AI frameworks (PyTorch/TensorFlow) with experience in training/inference performance optimization and cross-team collaboration for framework-infra co-optimization.

6. Preferred: Experience in **FinTech/quantitative AI infrastructure* understanding of business-critical computing demands and ability to drive cross-team collaboration and value delivery.

Employment Type

Full Time

Company Industry

Key Skills

Apply Now

About Company

IO TECH SOLUTIONS LIMITED

Report This Job

Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.

Start Now

Dr.Job AutoApply

3X your job search with AutoApply's AI for faster dream job results.

Senior AI Infrastructure Engineer

IO TECH SOLUTIONS LIMITED

Job Description

Employment Type

Company Industry

Key Skills

About Company

Similar Jobs

Platform Software Infrastructure Engineer

Senior Manager - AI Process and Programs

Senior DevOps Engineer

Senior Engineer(Java)

Senior AI Software Engineer â Python (m/f/x) (onsite / remote in Germany)

Senior AI Software Engineer â Python (m/f/x) (onsite / remote in Germany)

Senior Customer Success Engineer

AI Agent Ops Specialist