drjobs Infrastructure Engineer - Software Engineer – Infrastructure & Hardware Optimization - Remote

Infrastructure Engineer - Software Engineer – Infrastructure & Hardware Optimization - Remote

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Houston - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Hello

 

Infrastructure Engineer - Software Engineer Infrastructure & Hardware Optimization - Remote 

 

We have below job opening.

If you are interested and your experience match with job description.

Please send your updated

 

Software Engineer Infrastructure & Hardware Optimization

Location: SF CA Portland OR Dallas TX - Remote but need to be local of respective location

Duration: 6 Months Contract 

 

Job Description: We are seeking a skilled low-level systems engineer to join the team. This individual will focus on infrastructure software that detects configures and optimizes AI inference pipelines across heterogeneous hardware accelerators (e.g. NVIDIA / AMD GPUs TPUs AWS Inferentia FPGAs). You will work on hardware abstraction layers containerized runtime environments benchmarking telemetry and driver orchestration logic for multi-cloud agentic inference deployments.

 

Ideal Experience:

 

47 years experience in systems software or infrastructure engineering preferably with exposure to AI/ML workloads.

 

Deep expertise in CUDA NCCL ROCm or other accelerator programming frameworks.

 

Familiarity with LLM inference runtimes (TensorRT-LLM vLLM ONNXRuntime).

 

Experience with Kubernetes scheduling device plugin development and runtime patching for heterogeneous compute.

 

Strong Python/C and Linux systems programming skills.

 

Passion for building scalable portable and secure AI infrastructure.

 

Responsibilities:

 

Design and implement cross-platform hardware detection systems for GPUs/TPUs/NPUs using CUDA ROCm and low-level runtime interfaces.

 

Build and maintain plugin-based infrastructure for capability scoring power efficiency tuning and memory optimization.

 

Develop hardware abstraction layers (HAL) and performance benchmarking tools to optimize AI agents for cloud-native inference.

 

Extend container-based MLOps systems (Docker/Kubernetes) with support for hardware-specific runtime containers (e.g. TensorRT vLLM ROCm).

 

Automate driver validation container security hardening and runtime health monitoring across deployments.

 

Integrate telemetry systems (Prometheus Grafana) to surface per-device inference performance metrics and health status.

 

Collaborate with solutions and DevOps teams to ensure hardware-aware agent deployment across cloud providers.


Additional Information :

All your information will be kept confidential according to EEO guidelines.


Remote Work :

Yes


Employment Type :

Contract

Employment Type

Remote

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.