Product Manager AI Inference & Model Serving

Mirantis

Job Location:

Austin, TX - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Job Summary

Mirantis is looking for a commercially driven deeply technical Product Manager to own AI inference and model serving for k0rdent AI our control plane for GPU infrastructure and distributed AI workloads. This role sits at the intersection of AI inference cloud-native infrastructure distributed systems and performance engineering. You will define how NeoClouds and Enterprise customers deploy scale and operate production inference services while extracting maximum performance from the underlying GPU network and storage infrastructure.

This role owns product strategy and solution development for inference products across on-premises cloud and edge environments. The scope includes serverless inference dedicated endpoints workload placement autoscaling routing lifecycle management observability and full-stack performance optimization. This person will define how customers run production model-serving workloads at scale while improving latency throughput utilization reliability cost and operational control.

The ideal candidate has experience with high-performance infrastructure products and understands how production systems behave under real-world load. They should be comfortable reasoning across the full stack identifying performance bottlenecks evaluating system design trade-offs and translating technical insight into clear product requirements architecture direction and customer-facing solutions.

Responsibilities

Own product strategy roadmap and lifecycle for inference and model serving including serverless inference dedicated endpoints autoscaling routing KV cache management and the related observability
Lead deep technical discovery with NeoClouds sovereign clouds and enterprise platform teams and translate findings into prioritized requirements and architecture direction
Partner with engineering on system design trade-offs across runtime integration GPU scheduling network storage and serving topology including disaggregated serving and multi-model serving
Define positioning grounded in measurable outcomes: latency distributions throughput per GPU utilization tail reliability and cost per tokens
Drive go-to-market execution: pricing and packaging reference architectures sizing guides PoC playbooks and direct engagement with customers analysts and ecosystem partners

Qualifications :

7 years in product management technical product management or a senior technical role owning AI/ML and inference product(s)
Strong understanding of production AI inference including model serving serverless execution dedicated endpoints autoscaling routing workload placement observability and reliability
Proven capability to reason about performance trade-offs across GPU network storage orchestration and runtime layers and to translate low-level technical capability into business value such as TTFT throughput per GPU and TCO
Working knowledge of modern inference runtimes (vLLM SGLang TensorRT-LLM Dynamo Triton) and the optimization patterns that matter in production: continuous batching KV cache management cold starts prefill versus decode disaggregated serving and multi-model serving
Credibility with engineering leaders and infrastructure operators including comfort in production architecture reviews and technical commercial conversations with platform engineering buyers

Why youll love Mirantis

Build the token factory foundation for the AI cloud era working directly with leading GPU cloud operators NeoClouds sovereign clouds and AI-first enterprises
Collaborate with a world-class distributed team committed to openness and technical excellence
Shape the product narrative and influence go-to-market success

Additional Information :

What does Mirantis offer you

Work with an established Silicon Valley leader in the cloud infrastructure industry.
Work with exceptionally passionate talented and engaging colleagues helping Fortune 500 and Global 2000 customers implement next-generation cloud technologies.
Be a part of cutting-edge open-source innovation.
Thrive in the high-energy environment of a young company where openness collaboration risk-taking and continuous growth are valued.
Professional development and training.
Attend conferences and working groups.
Customized workstation (macOS Windows).
A competitive compensation package with strong benefits plan and stock options.

It is understood that Mirantis Inc. may use automated decision-making technology (ADMT) for specific employment-related decisions. Opting out of ADMT use is requested for decisions about evaluation and review connected with the specific employment decision for the position applied for. You also have the right to appeal any decisions made by ADMT by sending your request to

By submitting your resume you consent to the processing and storage of your personal data in accordance with applicable data protection laws for the purposes of considering your application for current and future job opportunities.

We are a Leader for Container Management in G2 (#2 after AWS)!

Remote Work :

Yes

Employment Type :

Full-time

Job SummaryMirantis is looking for a commercially driven deeply technical Product Manager to own AI inference and model serving for k0rdent AI our control plane for GPU infrastructure and distributed AI workloads. This role sits at the intersection of AI inference cloud-native infrastructure distrib...

Job Summary

Responsibilities

Own product strategy roadmap and lifecycle for inference and model serving including serverless inference dedicated endpoints autoscaling routing KV cache management and the related observability
Lead deep technical discovery with NeoClouds sovereign clouds and enterprise platform teams and translate findings into prioritized requirements and architecture direction
Partner with engineering on system design trade-offs across runtime integration GPU scheduling network storage and serving topology including disaggregated serving and multi-model serving
Define positioning grounded in measurable outcomes: latency distributions throughput per GPU utilization tail reliability and cost per tokens
Drive go-to-market execution: pricing and packaging reference architectures sizing guides PoC playbooks and direct engagement with customers analysts and ecosystem partners

Qualifications :

7 years in product management technical product management or a senior technical role owning AI/ML and inference product(s)
Strong understanding of production AI inference including model serving serverless execution dedicated endpoints autoscaling routing workload placement observability and reliability
Proven capability to reason about performance trade-offs across GPU network storage orchestration and runtime layers and to translate low-level technical capability into business value such as TTFT throughput per GPU and TCO
Working knowledge of modern inference runtimes (vLLM SGLang TensorRT-LLM Dynamo Triton) and the optimization patterns that matter in production: continuous batching KV cache management cold starts prefill versus decode disaggregated serving and multi-model serving
Credibility with engineering leaders and infrastructure operators including comfort in production architecture reviews and technical commercial conversations with platform engineering buyers

Why youll love Mirantis

Build the token factory foundation for the AI cloud era working directly with leading GPU cloud operators NeoClouds sovereign clouds and AI-first enterprises
Collaborate with a world-class distributed team committed to openness and technical excellence
Shape the product narrative and influence go-to-market success

Additional Information :

What does Mirantis offer you

Work with an established Silicon Valley leader in the cloud infrastructure industry.
Work with exceptionally passionate talented and engaging colleagues helping Fortune 500 and Global 2000 customers implement next-generation cloud technologies.
Be a part of cutting-edge open-source innovation.
Thrive in the high-energy environment of a young company where openness collaboration risk-taking and continuous growth are valued.
Professional development and training.
Attend conferences and working groups.
Customized workstation (macOS Windows).
A competitive compensation package with strong benefits plan and stock options.

We are a Leader for Container Management in G2 (#2 after AWS)!

Remote Work :

Yes

Employment Type :

Full-time

Apply Now

About Company

Mirantis

Mirantis is an open cloud company that helps organizations achieve digital self determination by giving them complete control over their strategic infrastructure. The company combines intelligent automation and cloud-native expertise for managing and operating virtual machines, contai ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click