Engineering Manager, Cloud Platform

Verdigris

Not Interested
Bookmark
Report This Job

profile Job Location:

Palo Alto, CA - USA

profile Monthly Salary: Not Disclosed
Posted on: 5 days ago
Vacancies: 1 Vacancy

Job Summary

GPU racks pull 120140 kW today. By 2027 that number hits 600 kW to 1 MW per rack. The entire AI buildout hundreds of billions in capex is being erected on a grid that was not designed for it. Design margins have compressed from 30% to 1015%. The monitoring systems built for the last generation of infrastructure poll at one-second intervals. GPU workloads ramp in eight milliseconds.

AI is accelerating faster than the infrastructure beneath it can be understood.
The incumbent vendors Schneider Eaton Vertiv were built for a world where loads were predictable and slow. They are not broken. They are mismatched to what AI infrastructure demands. Verdigris captures continuous waveforms at 8 kHz. That is not a software improvement on existing monitoring data. It is a different measurement entirely one that makes visible what no other system can see: hidden degradation safe operating headroom and the real-time electrical behavior of infrastructure running at the edge of its design limits.
We are not a monitoring solution. We are the electrical intelligence layer the validation layer that sits between the physical environment and the autonomous control systems the industry is building toward. Solving this matters beyond the business case. Carbon-free AI stranded capacity recovery and the long-term reliability of the compute layer the world is betting on all depend on getting electrical intelligence right at the physical layer.

The company
Twenty people. Lean by design. We have raised serious capital refocused the company around the most consequential problem in AI infrastructure and come out the other side with real customers real revenue and hardware that has been running in colocation and owned data center facilities for more than a decade. The cloud platform processes billions of 8 kHz waveform readings and turns them into validated operating limits that operators use daily.
This unique positionbuilt on our high-fidelity 8 kHz meteringconverts the strain on electrical infrastructure into a definitive roadmap for solving the AI industrys most critical power bottleneck and driving the sectors next wave of technological improvement.
Today that means reliability and early warning. Tomorrow it means capacity optimization and machine-facing orchestration APIs that GPU schedulers consume directly.

The role
We are hiring an Engineering Manager to own the cloud platform the system that makes all three product pillars work: Observability Intelligence and Orchestration.
You would manage a team of elite engineers report to the cofounder/CTO and hold a mandate to raise the bar on how this team builds and ships. This is a player-coach role. You will set direction run the engineering operating cadence and manage people. You will also read code debug production issues and make architectural calls. If you have not been in a codebase recently this is not the right fit.
We are building the management layer to accelerate towards best-in-class industry standards: clear ownership a culture of high craft and leadership that empowers and accelerates rather than administrates. The candidate we want believes in this velocity.
One more thing: a big part of how we operate is through deliberate opinionated use of agentic coding tools. The team is actively migrating towards an AI-native culture learning how to adopt practices that scale. You will be instrumental in defining and coaching the next standard for AI-native development here and you will recruit and coach to that standard.

The situation
The platform works. Customers depend on it. The 8 kHz ingestion pipeline is real and running in production.
The platform is at a strategic inflection point: we must mature the architecture and organizational structure to support the scale and velocity of our next-generation product roadmap. We need someone who can take ownership of the platform organize the team around clear ownership and raise the quality bar while also building toward future application layers that do not exist yet.

First 6 months

  • Audit the platform: reliability scalability observability tech debt. Form your own view not just ours.
  • Organize ownership across the three-pillar stack. Ingestion and the 8 kHz pipeline. ML signal processing and validated operating limits. The APIs MCPs and workflows that deliver them.
  • Stand up an engineering operating cadence: roadmap reviews incident reviews delivery planning architecture reviews.
  • Get your hands dirty on the hardest reliability and performance problems. Ship fixes not just plans.
  • Establish AI-native development practices on the team. Not a policy real tooling norms a shared view on where agentic coding accelerates and where it creates new risk.
  • Identify hiring gaps and start filling them. Raise the bar on who we bring in.

By 12 months here is what success looks like

  • Platform reliability and deployment velocity are measurably better. Fewer fires faster fixes.
  • The team ships consistently with clear ownership. They do not need you in every decision.
  • There is an engineering roadmap people trust one that connects todays reliability work to the capacity optimization and orchestration capabilities we are building toward.
  • You have made at least two hires who made the team noticeably stronger.
  • We are capitalizing on well-architected foundations enabling us to move up the value delivery chain with our customers through a suite of well thought-through applications.
  • The platform is positioned to support machine-facing orchestration APIs: the layer where validated intelligence feeds directly into GPU schedulers and demand response systems.

What we are looking for

  • Real technical depth in cloud infrastructure data systems or ML platforms. You can review architecture debug production and make tradeoffs not just delegate them.
  • You have inherited or built a small team before and made it better. You set expectations build ownership and coach people up.
  • You can operate without a clean roadmap. You turn ambiguity into a plan with owners and timelines.
    You care about production quality. Observability incident response release discipline. You build the habits not just the systems.
  • You have strong opinions about how agentic coding tools change what a small team can build. You are actively shaping how your team works with AI and you have the judgment to know where it helps and where it introduces new failure modes.
  • You are pulled by the mission. AI infrastructure is being built on a foundation that was not designed for it. Verdigris is the layer that makes it trustworthy. That framing should feel meaningful to you not just interesting.

Why this role

  • You would work directly with the founding team and own the platform that makes the product work.
  • The company is small enough that your decisions show up in the product and the culture within months. A lean team operating with the right practices and the right people can build like a team ten times its size. You will define what that looks like here.
  • The 8 kHz ingestion pipeline is already running in production. You are not starting from zero. You are taking something real and making it significantly better on infrastructure that actually matters.
  • If you are at a bigger company wondering whether you will ever get to build something from a position of real ownership this is that role.
We may use artificial intelligence (AI) tools to support parts of the hiring process such as reviewing applications analyzing resumes or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed please contact us.

Required Experience:

Manager

GPU racks pull 120140 kW today. By 2027 that number hits 600 kW to 1 MW per rack. The entire AI buildout hundreds of billions in capex is being erected on a grid that was not designed for it. Design margins have compressed from 30% to 1015%. The monitoring systems built for the last generation of ...
View more view more

About Company

Company Logo

Verdigris enables smart buildings through AI and proprietary real-time energy monitoring hardware. Verdigris delivers insights on energy usage per device when its critical to see it.

View Profile View Profile