Network Engineer, Operations & Repair

New York City, NY - USA

Monthly Salary: $ 150 - 250

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

About Fluidstack

At Fluidstack were building the infrastructure for abundant intelligence. We partner with top AI labs governments and enterprises - including Mistral Poolside Black Forest Labs Meta and more - to unlock compute at the speed of light.

Were working with urgency to make AGI a reality. As such our team is highly motivated and committed to delivering world-class infrastructure. We treat our customers outcomes as our own taking pride in the systems we build and the trust we earn. If youre motivated by purpose obsessed with excellence and ready to work very hard to accelerate the future of intelligence join us in building whats next.

About the Role

Fluidstack is seeking a Network Engineer Operations & Repair. This role combines hands-on Tier 2/3 network operations with site operational responsibilities. Youll be the expert for your assigned region ensuring network reliability through incident response break-fix coordination and operational excellence. Youll work remotely when workload allows but be onsite as needed for deployments complex troubleshooting and critical incidents.

This role is ideal for experienced network operators who want ownership of a datacenter campus while being part of a broader operations organization. Youll partner closely with the Operations & Reliability pillar lead centralized NOC for Tier 1 escalations and cross-functional teams including Deployment Hardware and DC Operations. Success means maintaining high availability for your region building strong relationships with onsite teams and growing into regional operations leadership as the team scales.

Focus

Regional Operations Ownership: Serve as the primary network operations contact for a datacenter region. Own network health respond to incidents escalated from NOC and ensure fabrics run reliably. Build deep knowledge of your regions network topology common failure modes and operational characteristics.
Tier 2 Incident Response: Handle network incidents escalated from Tier 1 NOC during your coverage window. Troubleshoot complex issues across physical and logical layers coordinate with other engineers for follow-the-sun coverage and drive incidents to resolution. Lead incident response when youre the subject matter expert..
Break-Fix Coordination: For incidents escalated and assigned coordinate with hardware repair teams onsite.
Operations Support: Support RMA case process and escalations with supplier support teams. Build and support dashboards per region and multi-region aggregate observability. Manage field testing of repair and other operations process and automation; providing visibility and feedback to partners developing the tooling.
Deployment Support: Provide operational support for datacenter deployments and expansions in your region. Partner with Deployment teams on turn-up activities validate production readiness and ensure smooth handovers from deployment to operations. Be the person who ensures new pods integrate seamlessly into operational workflows.
Runbook Execution & Improvement: Build and execute operational runbooks for both repair and non-repair activities. Identify gaps in runbooks document lessons learned and provide feedback to the Operations lead on runbook improvements.
Cross-Team Collaboration: Build relationships with onsite DC Operations teams structured cabling vendors and hardware logistics partners. Serve as the network engineering liaison for your datacenter region. Communicate clearly about network status planned maintenance and operational issues.

About You

Strong Operations Background: 5-8 years in network engineering with significant hands-on operational experience. Youve run production networks responded to incidents at all hours and debugged complex failures under pressure. You understand the difference between working and production-ready.
Analytics and Dashboards: Basic SQL and dashboard experience with Grafana Tableau or similar query/dashboard services. Basic python3 with jupyter notebooks or scripts.
Datacenter Fabric Expertise: Deep experience operating modern datacenter networks including EVPN/VXLAN BGP CLOS topologies and high-radix switching. Youre comfortable troubleshooting Layer 2/3 issues BGP routing problems fabric misconfigurations and physical media failures..
Incident Response Excellence: Proven ability to lead incident response perform systematic troubleshooting and drive issues to resolution. You remain calm during outages communicate clearly with stakeholders and know when to escalate versus when to dig deeper. Youve been the person others call when things break.
Matrix Leadership Experience: You understand how to build relationships with onsite teams coordinate physical infrastructure work and represent network engineering in a field environment. You know how to get things done in operational settings with many internal and external teams and stakeholders.
Operational Pragmatism: You balance perfection with progress. You can troubleshoot with imperfect information make pragmatic decisions under time pressure and prioritize based on business impact. You document as you go and continuously improve operational processes.
Hybrid Work Comfort: Youre productive working remotely but understand that datacenter operations sometimes require hands-on presence. Youre comfortable with 30-40% travel and flexible schedules that adapt to operational needs - sometimes remote sometimes onsite for days or weeks during critical periods.

Nice to Haves

AI/HPC Fabric Operations: Experience operating AI/ML or HPC fabrics with RDMA (RoCEv2) lossless Ethernet (PFC ECN) or high-performance networking. You understand the operational precision required when network performance directly impacts workload completion.
Regional/Campus Operations Leadership: Youve been a site lead campus engineer or regional operations lead before. You know how to coordinate across teams in a specific geographic location while reporting into a centralized organization.
Hardware Break-Fix Experience: Hands-on experience coordinating hardware repairs RMAs and physical infrastructure work. You understand datacenter logistics vendor escalation processes and how to work effectively with onsite technicians.
Observability & Monitoring: Familiarity with network monitoring platforms alerting systems and telemetry collection. Youve used monitoring tools to diagnose issues proactively and tune alerting to reduce noise. You have experience with SQL MySQL and building operations dashboards.
Automation Exposure: Basic scripting or automation experience (Python Ansible) for operational tasks. You may not be writing complex automation but you understand how to leverage tools to improve operational efficiency. You can build a rough prototype of whats needed and partner with developers to develop tools as one team.
Follow-the-Sun Experience: Experience working in distributed operations teams with follow-the-sun coverage models. You understand how to hand off incidents cleanly communicate operational status across time zones and coordinate with global teams.

Salary & Benefits

Competitive total compensation package (salary equity).
Retirement or pension plan in line with local norms.
Health dental and vision insurance.
Generous PTO policy in line with local norms.

The base salary range for this position is $150000 - $250000 per year depending on experience skills qualifications and location. This range represents our good faith estimate of the compensation for this role at the time of posting. Total compensation may also include equity in the form of stock options.

We are committed to pay equity and transparency.

Fluidstack is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race color religion sex national origin sexual orientation gender identity disability and protected veterans status or any other characteristic protected by law. Fluidstack will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

You will receive a confirmation email once your application has successfully been accepted. If there is an error with your submission and you did not receive a confirmation email please email with your resume/CV the role youve applied for and the date you submitted your application-- someone from our recruiting team will be in touch.

Required Experience:

About FluidstackAt Fluidstack were building the infrastructure for abundant intelligence. We partner with top AI labs governments and enterprises - including Mistral Poolside Black Forest Labs Meta and more - to unlock compute at the speed of light.Were working with urgency to make AGI a reality. As...

About Fluidstack

About the Role

Focus

Regional Operations Ownership: Serve as the primary network operations contact for a datacenter region. Own network health respond to incidents escalated from NOC and ensure fabrics run reliably. Build deep knowledge of your regions network topology common failure modes and operational characteristics.
Tier 2 Incident Response: Handle network incidents escalated from Tier 1 NOC during your coverage window. Troubleshoot complex issues across physical and logical layers coordinate with other engineers for follow-the-sun coverage and drive incidents to resolution. Lead incident response when youre the subject matter expert..
Break-Fix Coordination: For incidents escalated and assigned coordinate with hardware repair teams onsite.
Operations Support: Support RMA case process and escalations with supplier support teams. Build and support dashboards per region and multi-region aggregate observability. Manage field testing of repair and other operations process and automation; providing visibility and feedback to partners developing the tooling.
Deployment Support: Provide operational support for datacenter deployments and expansions in your region. Partner with Deployment teams on turn-up activities validate production readiness and ensure smooth handovers from deployment to operations. Be the person who ensures new pods integrate seamlessly into operational workflows.
Runbook Execution & Improvement: Build and execute operational runbooks for both repair and non-repair activities. Identify gaps in runbooks document lessons learned and provide feedback to the Operations lead on runbook improvements.
Cross-Team Collaboration: Build relationships with onsite DC Operations teams structured cabling vendors and hardware logistics partners. Serve as the network engineering liaison for your datacenter region. Communicate clearly about network status planned maintenance and operational issues.

About You

Strong Operations Background: 5-8 years in network engineering with significant hands-on operational experience. Youve run production networks responded to incidents at all hours and debugged complex failures under pressure. You understand the difference between working and production-ready.
Analytics and Dashboards: Basic SQL and dashboard experience with Grafana Tableau or similar query/dashboard services. Basic python3 with jupyter notebooks or scripts.
Datacenter Fabric Expertise: Deep experience operating modern datacenter networks including EVPN/VXLAN BGP CLOS topologies and high-radix switching. Youre comfortable troubleshooting Layer 2/3 issues BGP routing problems fabric misconfigurations and physical media failures..
Incident Response Excellence: Proven ability to lead incident response perform systematic troubleshooting and drive issues to resolution. You remain calm during outages communicate clearly with stakeholders and know when to escalate versus when to dig deeper. Youve been the person others call when things break.
Matrix Leadership Experience: You understand how to build relationships with onsite teams coordinate physical infrastructure work and represent network engineering in a field environment. You know how to get things done in operational settings with many internal and external teams and stakeholders.
Operational Pragmatism: You balance perfection with progress. You can troubleshoot with imperfect information make pragmatic decisions under time pressure and prioritize based on business impact. You document as you go and continuously improve operational processes.
Hybrid Work Comfort: Youre productive working remotely but understand that datacenter operations sometimes require hands-on presence. Youre comfortable with 30-40% travel and flexible schedules that adapt to operational needs - sometimes remote sometimes onsite for days or weeks during critical periods.

Nice to Haves

AI/HPC Fabric Operations: Experience operating AI/ML or HPC fabrics with RDMA (RoCEv2) lossless Ethernet (PFC ECN) or high-performance networking. You understand the operational precision required when network performance directly impacts workload completion.
Regional/Campus Operations Leadership: Youve been a site lead campus engineer or regional operations lead before. You know how to coordinate across teams in a specific geographic location while reporting into a centralized organization.
Hardware Break-Fix Experience: Hands-on experience coordinating hardware repairs RMAs and physical infrastructure work. You understand datacenter logistics vendor escalation processes and how to work effectively with onsite technicians.
Observability & Monitoring: Familiarity with network monitoring platforms alerting systems and telemetry collection. Youve used monitoring tools to diagnose issues proactively and tune alerting to reduce noise. You have experience with SQL MySQL and building operations dashboards.
Automation Exposure: Basic scripting or automation experience (Python Ansible) for operational tasks. You may not be writing complex automation but you understand how to leverage tools to improve operational efficiency. You can build a rough prototype of whats needed and partner with developers to develop tools as one team.
Follow-the-Sun Experience: Experience working in distributed operations teams with follow-the-sun coverage models. You understand how to hand off incidents cleanly communicate operational status across time zones and coordinate with global teams.

Salary & Benefits

Competitive total compensation package (salary equity).
Retirement or pension plan in line with local norms.
Health dental and vision insurance.
Generous PTO policy in line with local norms.

We are committed to pay equity and transparency.

Required Experience:

Key Skills

Field Marketing
Marine Biology
Anesthesia
E-Commerce
Asic

Apply Now

About Company

Fluidstack

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Network Engineer, Operations & Repair

New York City, NY - USA

Job Summary

About Fluidstack

About the Role

Focus

About You

Nice to Haves

Salary & Benefits

About Fluidstack

About the Role

Focus

About You

Nice to Haves

Salary & Benefits

Key Skills

About Company

Related Jobs