About Fluidstack
At Fluidstack were building the infrastructure for abundant intelligence. We partner with top AI labs governments and enterprises - including Mistral Poolside Black Forest Labs Meta and more - to unlock compute at the speed of light.
Were working with urgency to make AGI a reality. As such our team is highly motivated and committed to delivering world-class infrastructure. We treat our customers outcomes as our own taking pride in the systems we build and the trust we earn. If youre motivated by purpose obsessed with excellence and ready to work very hard to accelerate the future of intelligence join us in building whats next.
About the Role
Fluidstack is seeking a Lead NOC & Incident Management to build and lead our cross-functional operations center (NOC) and incident management execution function. Youll shape how Fluidstack detects triages and responds to operational events across our entire AI infrastructure portfolio from datacenter facilities to network backbone to internal platform services.
This role demands equal parts operational leadership and technical capability. Youll build the 24/7 monitoring and triage function operationalize our incident management framework and establish the operational culture that enables Fluidstack to meet stringent customer SLAs.
Success means Fluidstacks infrastructure teams stop spending time on operational toil alert monitoring carrier ticket management incident bridge setup shift coverage gaps and instead focus on engineering and reliability work. Youre the person who ensures someone is always watching the glass incidents are handled consistently and post-incident learning actually happens.
Focus
NOC Build & Operations: Stand up the cross-functional operations center from scratch. Assist in selecting and onboard an MSP partner for Tier 1 coverage. Build staffing models handoff processes KPIs and quality standards. Own the single question: is someone qualified watching every alert 24/7
Incident Management Execution: Create deploy and operationalize Fluidstacks incident management framework. Manage the Incident Manager on-call rotation. Train engineers on incident roles. Run incident bridges during SEV0/SEV1 events. Ensure post-incident reviews happen on schedule and action items actually close. Partner with the Program Manager (process owner) to continuously improve the framework based on real-world execution.
Operational Readiness: Own the are we ready question for every new domain onboarded to the NOC. Drive runbook quality assurance with functional teams. Plan and execute tabletop exercises. Coordinate with the Platform team on tooling workflows. Onboard new infrastructure domains (Facilities Network Systems) into NOC coverage on a phased schedule aligned with datacenter launches.
Cross-Functional Orchestration: Build tight operational partnerships with Network Ops DC Ops Systems/Platform and Security teams. Define clear Tier 1 Tier 2 escalation criteria for each domain. Ensure the NOC acts as a force multiplier for engineering teams by absorbing monitoring triage vendor ticket management and incident coordination.
Vendor & Carrier Ticket Lifecycle: Establish processes for the NOC to manage the full lifecycle of carrier and vendor tickets creation tracking SLA enforcement escalation. Work with Network Ops and DC Ops to define ticket templates escalation triggers and vendor communication standards. Ensure no ticket falls through the cracks and every carrier/vendor interaction is documented.
Metrics & Continuous Improvement: Establish operational metrics (MTTA MTTR escalation rate false positive rate runbook coverage) and reporting cadence. Use data to identify patterns reduce alert noise improve runbook quality and drive down incident response times. Produce monthly operational reports for leadership and customer-facing stakeholders.
About You
Proven NOC/Operations Center Leadership: 5 years in network operations infrastructure operations or site reliability roles with significant experience running and building a NOC operations center or equivalent 24/7 monitoring function. Youve built shift models managed MSP relationships and know how to turn a collection of monitors into a high-performing operational team. Ideally youve done this at global scale.
Incident Management Expertise: Deep experience with structured incident response processes severity classification escalation matrices incident bridges post-incident reviews and RCA workflows. Youve been an Incident Manager or Incident Commander for major incidents and you know what good looks like under pressure. You understand that incident management is a skill that requires training practice and continuous refinement.
Technical Credibility Across Domains: You dont need to be the deepest expert in network engineering facilities or systems but you need enough technical breadth to triage alerts intelligently ask the right questions during incidents and earn the trust of the engineering teams youll partner with. Experience with datacenter infrastructure (network power cooling) and modern monitoring stacks (Prometheus/VictoriaMetrics Grafana AlertManager) is strongly preferred.
Process Builder Not Just Process Follower: Youve built operational processes from scratch in environments where they didnt exist before. You know how to design runbooks that contract operators can execute reliably escalation criteria that are crisp enough to be actionable and training programs that get new team members productive quickly. You iterate based on real-world feedback not theoretical perfection.
Cross-Team Influence: Exceptional at building partnerships across functional teams without direct authority. Youve navigated the dynamics of getting engineering teams to write runbooks participate in on-call rotations and take post-incident actions seriously. You lead through credibility follow-through and consistent operational excellence rather than organizational hierarchy.
Customer SLA Mindset: You understand that operational metrics arent just internal targets theyre the foundation of customer trust. Youve worked in environments with stringent SLAs and you know how to build the operational discipline required to consistently meet them. You think about every process decision through the lens of what happens when this matters at 2 AM
Nice to Haves
Hyperscale or Large-Scale Infrastructure Background: Experience operating NOC/operations centers at hyperscale companies (Meta Google Microsoft AWS) large telcos or major AI infrastructure providers. Youve seen what mature operations looks like at scale and can adapt those patterns to a fast-growing startup.
Incident Management Tooling: Hands-on experience with incident management platforms ( PagerDuty Opsgenie ServiceNow) including configuration of escalation policies on-call schedules and alert routing. Bonus if youve led a platform migration or stood up a new instance from scratch.
MSP/Vendor Management: Experience selecting onboarding and managing managed service providers for NOC or operations functions. Youve written SOWs negotiated SLAs and managed the transition from outsourced to internal operations.
Facilities & BMS Familiarity: Exposure to datacenter facilities operations power distribution cooling systems CDUs BMS/SCADA alerting. You dont need to be a mechanical engineer but understanding facilities alert triage is valuable since Facilities is the MVP domain for the NOC.
Carrier & ISP Operations: Experience managing carrier relationships circuit troubleshooting and vendor ticket workflows. Familiarity with carrier NOC processes circuit ID management and SLA enforcement.
Startup Experience: Youve built something from scratch before ideally in a high-growth infrastructure or cloud company. Youre comfortable with rapid context switching evolving requirements and the intensity of early-stage company building.
Salary & Benefits
Competitive total compensation package (salary equity).
Retirement or pension plan in line with local norms.
Health dental and vision insurance.
Generous PTO policy in line with local norms.
The base salary range for this position is $200000 - $300000 per year depending on experience skills qualifications and location. This range represents our good faith estimate of the compensation for this role at the time of posting. Total compensation may also include equity in the form of stock options.
We are committed to pay equity and transparency.
Fluidstack is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race color religion sex national origin sexual orientation gender identity disability and protected veterans status or any other characteristic protected by law. Fluidstack will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
You will receive a confirmation email once your application has successfully been accepted. If there is an error with your submission and you did not receive a confirmation email please email with your resume/CV the role youve applied for and the date you submitted your application-- someone from our recruiting team will be in touch.
About FluidstackAt Fluidstack were building the infrastructure for abundant intelligence. We partner with top AI labs governments and enterprises - including Mistral Poolside Black Forest Labs Meta and more - to unlock compute at the speed of light.Were working with urgency to make AGI a reality. As...
About Fluidstack
At Fluidstack were building the infrastructure for abundant intelligence. We partner with top AI labs governments and enterprises - including Mistral Poolside Black Forest Labs Meta and more - to unlock compute at the speed of light.
Were working with urgency to make AGI a reality. As such our team is highly motivated and committed to delivering world-class infrastructure. We treat our customers outcomes as our own taking pride in the systems we build and the trust we earn. If youre motivated by purpose obsessed with excellence and ready to work very hard to accelerate the future of intelligence join us in building whats next.
About the Role
Fluidstack is seeking a Lead NOC & Incident Management to build and lead our cross-functional operations center (NOC) and incident management execution function. Youll shape how Fluidstack detects triages and responds to operational events across our entire AI infrastructure portfolio from datacenter facilities to network backbone to internal platform services.
This role demands equal parts operational leadership and technical capability. Youll build the 24/7 monitoring and triage function operationalize our incident management framework and establish the operational culture that enables Fluidstack to meet stringent customer SLAs.
Success means Fluidstacks infrastructure teams stop spending time on operational toil alert monitoring carrier ticket management incident bridge setup shift coverage gaps and instead focus on engineering and reliability work. Youre the person who ensures someone is always watching the glass incidents are handled consistently and post-incident learning actually happens.
Focus
NOC Build & Operations: Stand up the cross-functional operations center from scratch. Assist in selecting and onboard an MSP partner for Tier 1 coverage. Build staffing models handoff processes KPIs and quality standards. Own the single question: is someone qualified watching every alert 24/7
Incident Management Execution: Create deploy and operationalize Fluidstacks incident management framework. Manage the Incident Manager on-call rotation. Train engineers on incident roles. Run incident bridges during SEV0/SEV1 events. Ensure post-incident reviews happen on schedule and action items actually close. Partner with the Program Manager (process owner) to continuously improve the framework based on real-world execution.
Operational Readiness: Own the are we ready question for every new domain onboarded to the NOC. Drive runbook quality assurance with functional teams. Plan and execute tabletop exercises. Coordinate with the Platform team on tooling workflows. Onboard new infrastructure domains (Facilities Network Systems) into NOC coverage on a phased schedule aligned with datacenter launches.
Cross-Functional Orchestration: Build tight operational partnerships with Network Ops DC Ops Systems/Platform and Security teams. Define clear Tier 1 Tier 2 escalation criteria for each domain. Ensure the NOC acts as a force multiplier for engineering teams by absorbing monitoring triage vendor ticket management and incident coordination.
Vendor & Carrier Ticket Lifecycle: Establish processes for the NOC to manage the full lifecycle of carrier and vendor tickets creation tracking SLA enforcement escalation. Work with Network Ops and DC Ops to define ticket templates escalation triggers and vendor communication standards. Ensure no ticket falls through the cracks and every carrier/vendor interaction is documented.
Metrics & Continuous Improvement: Establish operational metrics (MTTA MTTR escalation rate false positive rate runbook coverage) and reporting cadence. Use data to identify patterns reduce alert noise improve runbook quality and drive down incident response times. Produce monthly operational reports for leadership and customer-facing stakeholders.
About You
Proven NOC/Operations Center Leadership: 5 years in network operations infrastructure operations or site reliability roles with significant experience running and building a NOC operations center or equivalent 24/7 monitoring function. Youve built shift models managed MSP relationships and know how to turn a collection of monitors into a high-performing operational team. Ideally youve done this at global scale.
Incident Management Expertise: Deep experience with structured incident response processes severity classification escalation matrices incident bridges post-incident reviews and RCA workflows. Youve been an Incident Manager or Incident Commander for major incidents and you know what good looks like under pressure. You understand that incident management is a skill that requires training practice and continuous refinement.
Technical Credibility Across Domains: You dont need to be the deepest expert in network engineering facilities or systems but you need enough technical breadth to triage alerts intelligently ask the right questions during incidents and earn the trust of the engineering teams youll partner with. Experience with datacenter infrastructure (network power cooling) and modern monitoring stacks (Prometheus/VictoriaMetrics Grafana AlertManager) is strongly preferred.
Process Builder Not Just Process Follower: Youve built operational processes from scratch in environments where they didnt exist before. You know how to design runbooks that contract operators can execute reliably escalation criteria that are crisp enough to be actionable and training programs that get new team members productive quickly. You iterate based on real-world feedback not theoretical perfection.
Cross-Team Influence: Exceptional at building partnerships across functional teams without direct authority. Youve navigated the dynamics of getting engineering teams to write runbooks participate in on-call rotations and take post-incident actions seriously. You lead through credibility follow-through and consistent operational excellence rather than organizational hierarchy.
Customer SLA Mindset: You understand that operational metrics arent just internal targets theyre the foundation of customer trust. Youve worked in environments with stringent SLAs and you know how to build the operational discipline required to consistently meet them. You think about every process decision through the lens of what happens when this matters at 2 AM
Nice to Haves
Hyperscale or Large-Scale Infrastructure Background: Experience operating NOC/operations centers at hyperscale companies (Meta Google Microsoft AWS) large telcos or major AI infrastructure providers. Youve seen what mature operations looks like at scale and can adapt those patterns to a fast-growing startup.
Incident Management Tooling: Hands-on experience with incident management platforms ( PagerDuty Opsgenie ServiceNow) including configuration of escalation policies on-call schedules and alert routing. Bonus if youve led a platform migration or stood up a new instance from scratch.
MSP/Vendor Management: Experience selecting onboarding and managing managed service providers for NOC or operations functions. Youve written SOWs negotiated SLAs and managed the transition from outsourced to internal operations.
Facilities & BMS Familiarity: Exposure to datacenter facilities operations power distribution cooling systems CDUs BMS/SCADA alerting. You dont need to be a mechanical engineer but understanding facilities alert triage is valuable since Facilities is the MVP domain for the NOC.
Carrier & ISP Operations: Experience managing carrier relationships circuit troubleshooting and vendor ticket workflows. Familiarity with carrier NOC processes circuit ID management and SLA enforcement.
Startup Experience: Youve built something from scratch before ideally in a high-growth infrastructure or cloud company. Youre comfortable with rapid context switching evolving requirements and the intensity of early-stage company building.
Salary & Benefits
Competitive total compensation package (salary equity).
Retirement or pension plan in line with local norms.
Health dental and vision insurance.
Generous PTO policy in line with local norms.
The base salary range for this position is $200000 - $300000 per year depending on experience skills qualifications and location. This range represents our good faith estimate of the compensation for this role at the time of posting. Total compensation may also include equity in the form of stock options.
We are committed to pay equity and transparency.
Fluidstack is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race color religion sex national origin sexual orientation gender identity disability and protected veterans status or any other characteristic protected by law. Fluidstack will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
You will receive a confirmation email once your application has successfully been accepted. If there is an error with your submission and you did not receive a confirmation email please email with your resume/CV the role youve applied for and the date you submitted your application-- someone from our recruiting team will be in touch.
View more
View less