Senior Infra Engineer Baremetal Orchestration
San Francisco, CA - USA
Job Summary
Job description
Our core mission at Railway is to make software engineers higher leverage. We believe that people should be given powerful tools so that they can spend less time setting up to do and more time doing.
Many infrastructure platforms simply focus on how you deploy your singular application and now how these applications function in concert. Questions like How do you build systems for zero downtime deployment How do you do service-to-service communications etc are usually left up to the engineers to define.
At Railway our goal is to be an all encompassing solution to all these problems. As such we take special care as we define our networking infrastructure.
But the world would be a better place if more engineers like me hated technology. The stuff I design if Im successful nobody will ever notice. Things will just work and will be self-managing
- Radia Perlman
About the role
For this role you will:
Build and maintain our host provisioning stack: PXE boot Ansible and burn-in agents that bring new bare metal online quickly and confidently
Continue to evolve our homegrown orchestration engine to manage clusters containers and VMs through a single lens
Optimize the efficiency of our bin packing algorithm to maximize utilization/performance and minimize costs
Own the internal tooling that Railway engineers use to interact with our fleet every day
Build out internal observability and alerting so we catch fleet problems before customers feel them
Design and maintain the CI pipelines that ship our infrastructure code safely
Define infrastructure that can be torn down failed over and reconstituted from scratch using principle of immutable infrastructure using Terraform and Ansible
Build Golang/Rust GRPC services from scratch capable of supporting millions of users
Write Engineering Requirement Documents to take something from idea to defined tasks to implementation to monitoring its success
The arc of this role is more internal-facing than user-facing. Youre building the platform that Railway engineers run on. This is a high impact high agency role with direct effect on company culture trajectory and outcome.
About you
A strong understanding of distributed systems and what it takes to operate them. You enjoy building fault tolerant resilient and scalable services and you care about what happens when they break at 3am
Hands-on experience with bare metal provisioning configuration management and the unglamorous-but-critical work of getting hardware production-ready
Comfort building and operating internal tools. You understand that developer experience inside the company matters as much as the product outside it
A solid intuition about how long your solutions will last. All systems startups we can hope for 2-3 orders of magnitude or 12-18mo
The tact to implement your solution create monitors for its error boundaries and document any requirements for when youre not around
A great sense of direction and prioritization when it comes to dealing with the ambiguity of an early stage startup
A sense of grit to dive into a problem implement a solution scale that solution and replace it when needed
A great set of communication skills for getting your point across solution implemented and beyond
We value and love to work with diverse persons from all backgrounds
Things to know
For better or worse were a startup; our team dynamics are different from companies of different sizes and stages.
Were distributed ALL across the globe and thats only going to be more and more distributed. As a result stuff is ALWAYS happening.
We do NOT expect you to work all the time but youll have to be diligent about your boundaries because the end of your day may overlap with the start of someone elses.
Were a small team with high ownership who are not only passionate about what we do but seek to be exceptional as well. At the time of writing were 21 serving hundreds of thousands of users. Theres a lot of stuff going on and a lot of ambiguity.
We want you to own it. We believe that ownership is a key to growth and part of that growth is not only being able to make the choices but owning the success or failure that comes with those choices.
Benefits and perks
At Railway we provide best in class benefits. Great salary full health benefits including dependents strong equity grants equipment stipend and much more. For more details check back on the main careers page.
Beyond compensation there are a few things that we believe that make working at Railway truly unique:
Autonomy: We have very few meetings. Just a Monday and a Friday to go over the Company Board. We think your time is sacred whether its at work or outside of work.
Ownership: Were a company with a high ownership high autonomy culture. We hope that youll come in help us and over the course of many years do the best work of your life. When we bring you onboard we expect you to change the company.
Novel problems/solutions: Were a startup thats well funded with cool problems which lets us implement novel solutions! We abhor busywork and think whether its community engineering operations etc theres always opportunity for creative and high leverage solutions.
Growth: We want you to grow with us but we know that talent is loaned so when you figure out what area you want to grow in next whether its at Railway or outside well make sure you land there.
How we hire
No tricks. No surprises. Heres the entire process.
1) Talk with us about the role
This is completely open ended and were just trying to see who you are what you want to do and where you wanna go.
2) Work on a small project to discuss in the interview
Asynchronously implement the following:
Imagine a theoretical or actual system like Railway which can manage stateless and stateful compute workloads. Design the engine for managing orchestration
Interview Structure (60 Minutes):
Pre-work (before your interview): Complete your solution (advised)
0-5m: introduction
5-50m: Building (or expanding) your solution
50-60m: Questions on Railway/Tech/etc
You can and SHOULD! ask us questions ahead of time. Ask away!
3) Review your solution with the Team
Youll sit down with someone on the team and go over the above. Well poke into your solution as well as get you acquainted with two more members of the team.
Looking for: Learn about your problem solving skills. How you break down a problem and how you present a solution.
4) Meet the Team
Youll meet the Team which will be comprised of 4 people from vastly different sections of the company.
Looking for: How you work with the rest of the team and communicate.
5) Chat with CEO
Sit down with our founder and CEO for 30 minutes. This is a 1:1 open ended conversation.
6) Offer call
Finally we will present the offers hammer out the details about your position tee up onboarding and start our journey together.
Final Note: The interview goes both ways. Once again please ask us things. Many things! Hard things. Thats what were here for.
Required Experience:
Senior IC
About Company
Railway is an infrastructure platform where you can provision infrastructure, develop with that infrastructure locally, and then deploy to the cloud.