About Fluidstack
At Fluidstack were building the infrastructure for abundant intelligence. We partner with top AI labs governments and enterprises - including Mistral Poolside Black Forest Labs Meta and more - to unlock compute at the speed of light.
Were working with urgency to make AGI a reality. As such our team is highly motivated and committed to delivering world-class infrastructure. We treat our customers outcomes as our own taking pride in the systems we build and the trust we earn. If youre motivated by purpose obsessed with excellence and ready to work very hard to accelerate the future of intelligence join us in building whats next.
About the Role
Fluidstack is seeking a Network Engineer Reliability & Observability to serve as a reliability engineer championing and building process data collections and reliability metrics with the objective of improving the quality and reliability of AI networks from deployment through the full lifecycle of operations.
This role is focused on developing processes systems tools data and data pipelines and observability to improve the quality of networks and deliver automated metrics (24x7) as well as periodic reliability reports for both internal and external customers.
This role is ideal for experienced network operators who are passionate about reliability and have experience designing and building full lifecycle software such as Quality Assurance audits circuit audits periodic audits failure rates and failure analysis. You are passionate about hardware (electronics and optics) software development and you value and promote the use of data to make informed decisions in deployment operations and strategic sourcing.
Experienced SRE (Site Reliability Engineers) with a passion for networking are encouraged to apply.
Focus
Ownership of Quality Assurance: Design develop and support QA process for network hardware and networks.
Pipelines: Develop and deploy serverless workflows server based and manually triggered data pipelines producing network quality and reliability observability for internal and external customers.
Deployment and Operations Support: Support full lifecycle data collection and analysis partnering with Deployment Operations DC hardware and logistics teams to produce data that drives process improvements and delivers on SLA and SLOs.
Process Engineering: Develop pilot and deploy process improvements for deployment and repair to produce data and consume data with Machine Learning to fulfill our mission.
Cross-Team Collaboration: Own without ego and execute in a collaborative team with design deployment operations engineers and software developers.
Subject Matter Expert: In at least two or more deep subjects such as IP routing optics optical transport Ethernet RDMA/RoCE or electrical power.
About You
Strong Operations Background: 5 years in network engineering and at least 3 years in operations with significant hands-on operational experience. Youve run production networks or compute responded to incidents at all hours and debugged complex failures under pressure. You understand the difference between working and production-ready.
Software Development: You have experience with ITIL Agile (xP) and TDD including developing and leading programs and projects. You have experience building hyperscale platforms in Golang with supporting tools in Python or RUST.
Datacenter Fabric Expertise: Deep experience operating modern datacenter networks including EVPN/VXLAN BGP CLOS topologies and high-radix switching. Youre comfortable troubleshooting Layer 2/3 issues BGP routing problems fabric misconfigurations and physical media failures..
Incident Response Excellence: Proven ability to lead incident response perform systematic troubleshooting and drive issues to resolution. You remain calm during outages communicate clearly with stakeholders and know when to escalate versus when to dig deeper. Youve been the person others call when things break.
Matrix Leadership Experience: You understand how to build relationships with onsite teams coordinate physical infrastructure work and represent network engineering in a field environment. You know how to get things done in operational settings with many internal and external teams and stakeholders.
Operational Pragmatism: You balance perfection with progress. You can troubleshoot with imperfect information make pragmatic decisions under time pressure and prioritize based on business impact. You document as you go and continuously improve operational processes.
Self Driven: You embrace complex challenges with undefined process and key results. You can dive in to learn but zoom back out to build Objectives develop Key Results and build a software development project and pipeline in Jira solo. You can then switch hats and begin coding.
Nice to Haves
AI/HPC Fabric Operations: Experience operating AI/ML or HPC fabrics with RDMA (RoCEv2) lossless Ethernet (PFC ECN) or high-performance networking. You understand the operational precision required when network performance directly impacts workload completion.
Reliability Engineering: You have experience with observability and reliability engineering from network operations or in manufacturing quality.
Hardware Repair Experience: Hands-on experience coordinating hardware repairs RMAs and physical infrastructure work. You understand datacenter logistics vendor escalation processes and how to work effectively with onsite technicians.
Observability & Monitoring: Familiarity with network monitoring platforms alerting systems and telemetry collection. Youve used monitoring tools to diagnose issues proactively and tune alerting to reduce noise. You have experience with SQL MySQL and building operations dashboards.
Salary & Benefits
Competitive total compensation package (salary equity).
Retirement or pension plan in line with local norms.
Health dental and vision insurance.
Generous PTO policy in line with local norms.
The base salary range for this position is $150000 - $250000 per year depending on experience skills qualifications and location. This range represents our good faith estimate of the compensation for this role at the time of posting. Total compensation may also include equity in the form of stock options.
We are committed to pay equity and transparency.
Fluidstack is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race color religion sex national origin sexual orientation gender identity disability and protected veterans status or any other characteristic protected by law. Fluidstack will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
We are committed to pay equity and transparency.
Fluidstack is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race color religion sex national origin sexual orientation gender identity disability and protected veterans status or any other characteristic protected by law. Fluidstack will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
You will receive a confirmation email once your application has successfully been accepted. If there is an error with your submission and you did not receive a confirmation email please email with your resume/CV the role youve applied for and the date you submitted your application-- someone from our recruiting team will be in touch.
Required Experience:
IC
About FluidstackAt Fluidstack were building the infrastructure for abundant intelligence. We partner with top AI labs governments and enterprises - including Mistral Poolside Black Forest Labs Meta and more - to unlock compute at the speed of light.Were working with urgency to make AGI a reality. As...
About Fluidstack
At Fluidstack were building the infrastructure for abundant intelligence. We partner with top AI labs governments and enterprises - including Mistral Poolside Black Forest Labs Meta and more - to unlock compute at the speed of light.
Were working with urgency to make AGI a reality. As such our team is highly motivated and committed to delivering world-class infrastructure. We treat our customers outcomes as our own taking pride in the systems we build and the trust we earn. If youre motivated by purpose obsessed with excellence and ready to work very hard to accelerate the future of intelligence join us in building whats next.
About the Role
Fluidstack is seeking a Network Engineer Reliability & Observability to serve as a reliability engineer championing and building process data collections and reliability metrics with the objective of improving the quality and reliability of AI networks from deployment through the full lifecycle of operations.
This role is focused on developing processes systems tools data and data pipelines and observability to improve the quality of networks and deliver automated metrics (24x7) as well as periodic reliability reports for both internal and external customers.
This role is ideal for experienced network operators who are passionate about reliability and have experience designing and building full lifecycle software such as Quality Assurance audits circuit audits periodic audits failure rates and failure analysis. You are passionate about hardware (electronics and optics) software development and you value and promote the use of data to make informed decisions in deployment operations and strategic sourcing.
Experienced SRE (Site Reliability Engineers) with a passion for networking are encouraged to apply.
Focus
Ownership of Quality Assurance: Design develop and support QA process for network hardware and networks.
Pipelines: Develop and deploy serverless workflows server based and manually triggered data pipelines producing network quality and reliability observability for internal and external customers.
Deployment and Operations Support: Support full lifecycle data collection and analysis partnering with Deployment Operations DC hardware and logistics teams to produce data that drives process improvements and delivers on SLA and SLOs.
Process Engineering: Develop pilot and deploy process improvements for deployment and repair to produce data and consume data with Machine Learning to fulfill our mission.
Cross-Team Collaboration: Own without ego and execute in a collaborative team with design deployment operations engineers and software developers.
Subject Matter Expert: In at least two or more deep subjects such as IP routing optics optical transport Ethernet RDMA/RoCE or electrical power.
About You
Strong Operations Background: 5 years in network engineering and at least 3 years in operations with significant hands-on operational experience. Youve run production networks or compute responded to incidents at all hours and debugged complex failures under pressure. You understand the difference between working and production-ready.
Software Development: You have experience with ITIL Agile (xP) and TDD including developing and leading programs and projects. You have experience building hyperscale platforms in Golang with supporting tools in Python or RUST.
Datacenter Fabric Expertise: Deep experience operating modern datacenter networks including EVPN/VXLAN BGP CLOS topologies and high-radix switching. Youre comfortable troubleshooting Layer 2/3 issues BGP routing problems fabric misconfigurations and physical media failures..
Incident Response Excellence: Proven ability to lead incident response perform systematic troubleshooting and drive issues to resolution. You remain calm during outages communicate clearly with stakeholders and know when to escalate versus when to dig deeper. Youve been the person others call when things break.
Matrix Leadership Experience: You understand how to build relationships with onsite teams coordinate physical infrastructure work and represent network engineering in a field environment. You know how to get things done in operational settings with many internal and external teams and stakeholders.
Operational Pragmatism: You balance perfection with progress. You can troubleshoot with imperfect information make pragmatic decisions under time pressure and prioritize based on business impact. You document as you go and continuously improve operational processes.
Self Driven: You embrace complex challenges with undefined process and key results. You can dive in to learn but zoom back out to build Objectives develop Key Results and build a software development project and pipeline in Jira solo. You can then switch hats and begin coding.
Nice to Haves
AI/HPC Fabric Operations: Experience operating AI/ML or HPC fabrics with RDMA (RoCEv2) lossless Ethernet (PFC ECN) or high-performance networking. You understand the operational precision required when network performance directly impacts workload completion.
Reliability Engineering: You have experience with observability and reliability engineering from network operations or in manufacturing quality.
Hardware Repair Experience: Hands-on experience coordinating hardware repairs RMAs and physical infrastructure work. You understand datacenter logistics vendor escalation processes and how to work effectively with onsite technicians.
Observability & Monitoring: Familiarity with network monitoring platforms alerting systems and telemetry collection. Youve used monitoring tools to diagnose issues proactively and tune alerting to reduce noise. You have experience with SQL MySQL and building operations dashboards.
Salary & Benefits
Competitive total compensation package (salary equity).
Retirement or pension plan in line with local norms.
Health dental and vision insurance.
Generous PTO policy in line with local norms.
The base salary range for this position is $150000 - $250000 per year depending on experience skills qualifications and location. This range represents our good faith estimate of the compensation for this role at the time of posting. Total compensation may also include equity in the form of stock options.
We are committed to pay equity and transparency.
Fluidstack is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race color religion sex national origin sexual orientation gender identity disability and protected veterans status or any other characteristic protected by law. Fluidstack will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
We are committed to pay equity and transparency.
Fluidstack is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race color religion sex national origin sexual orientation gender identity disability and protected veterans status or any other characteristic protected by law. Fluidstack will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
You will receive a confirmation email once your application has successfully been accepted. If there is an error with your submission and you did not receive a confirmation email please email with your resume/CV the role youve applied for and the date you submitted your application-- someone from our recruiting team will be in touch.
Required Experience:
IC
View more
View less