Senior Product Manager AI Observability

Clockwork.io

Not Interested
Bookmark
Report This Job

profile Job Location:

Palo Alto, CA - USA

profile Monthly Salary: Not Disclosed
Posted on: 9 hours ago
Vacancies: 1 Vacancy

Job Summary

About Clockwork Systems

Software Driven Fabrics to increase GPU cluster utilization

Clockwork Systems was founded by Stanford researchers and veteran systems engineers who share a vision for redefining the foundations of distributed computing. As AI workloads grow increasingly complex traditional infrastructure struggles to meet the demands of performance reliability and precise coordination. Clockwork is pioneering a software-driven approach to AI fabrics by delivering cross-stack observability to catch and quickly resolve problems workload fault tolerance to keep jobs running through failures and performance acceleration that dynamically routes and paces traffic to avoid congestion.

To learn more visit .

About the Role

As Senior Product Manager for AI Observability you will lead the product strategy and execution for Clockworks cross-stack observability solution which helps customers detect slow or failing workloads and precisely correlate them with underlying infrastructure issues. Youll work at the forefront of the emerging AI market bringing world-first observability technologies to life.

What You Will Do

  • Define and drive product strategy and roadmap for Clockworks AI Observability portfolio covering Fleet Audit (pre-flight validation) Fleet Observability (to uncover and solve fabric issues in real-time) and AI Workload Observability (to identify workload issues and correlate them to the underlying infrastructure).
  • Develop a deep understanding of pain points and workflows by working directly with customers and crisply translate them into compelling and differentiated product requirements.
  • Drive end-to-end rapid execution - write PRDs set priorities unblock teams make tradeoffs and ensure high-quality releases.
  • Partner cross-functionally with engineering sales and marketing to shape the product ship reliably and communicate clear value to technical customers.
  • Be the voice of the product internally

What Were Looking For

  • 7 years of Product Management experience with at least some time working in the observability space
  • Strong experience with modern observability stacks: metrics logs traces OpenTelemetry Prometheus/Grafana. Familiarity with GPU observability tooling (e.g NVIDIA DCGM NSight) and experience with MLOps and LLMOpps ecosystems is a plus.
  • Strong technical depth in Kubernetes SLURM AI training and related components (e.g. PyTorch NCCL etc.) GPU clusters and RDMA networking (InfiniBand and RoCE)
  • Excellent product leadership - clear writing crisp tradeoffs strong prioritization and the ability to collaborate effectively with highly technical engineering teams
  • Customer empathy and discovery strength - able to identify high-impact pain points and convert them into compelling product strategy and execution.
  • A builder mindset that is energized by early-stage products rapid iteration customer closeness and shipping market changing solutions.

Enjoy

  • Challenging projects.
  • A friendly and inclusive workplace culture.
  • Competitive compensation.
  • A great benefits package.
  • Catered lunch.

Clockwork Systems is an equal opportunity employer. We are committed to building world-class teams by welcoming bright passionate individuals from all backgrounds. All qualified applicants will receive consideration for employment without regard to race color ancestry religion age sex sexual orientation gender identity or expression national origin disability or protected veteran status. We believe diversity drives innovation and we grow stronger together.


Required Experience:

Senior IC

About Clockwork Systems Software Driven Fabrics to increase GPU cluster utilizationClockwork Systems was founded by Stanford researchers and veteran systems engineers who share a vision for redefining the foundations of distributed computing. As AI workloads grow increasingly complex traditional in...
View more view more

Key Skills

  • Time Management
  • Data Analytics
  • Analytical
  • Agile
  • Requirement Gathering
  • Strategic thinking
  • Visio
  • Communication
  • Problem Solving
  • Market Research
  • UML
  • Cross Functional Teams