Infrastructure engineer (UK)
Job Summary
About WRITER
WRITER is where the worlds leading enterprises orchestrate AI-powered work. Our vision is to expand human capacity through superintelligence. And were proving its possible through powerful trustworthy AI that unites IT and business teams together to unlock enterprise-wide transformation. With WRITERs end-to-end platform hundreds of companies like Mars Marriott Uber and Vanguard are building and deploying AI agents that are grounded in their companys data and fueled by WRITERs enterprise-grade LLMs. Valued at $1.9B and backed by industry-leading investors including Premji Invest Radical Ventures and ICONIQ Growth WRITER is rapidly cementing its position as the leader in enterprise generative AI.
Founded in 2020 with office hubs in San Francisco New York City Seattle Austin Chicago and London our team thinks big and moves fast and were looking for smart hardworking builders and scalers to join us on our journey to create a better future of work with AI.
About the role
At WRITER our mission to expand human capacity with superintelligence relies on a foundational truth: our platform must be available performant and reliable 24/7. As an Infrastructure engineer youll be at the heart of making this a reality impacting every enterprise customer who trusts us with their AI-powered workflows. This isnt just about keeping the lights on; its about pushing the boundaries of whats possible proactively identifying and solving complex systemic challenges and laying the groundwork for our rapid growth and the evolving demands of enterprise generative AI. Youll build resilient systems automate across the stack and champion reliability best practices directly enabling our ambitious product roadmap and ensuring our customers always have access to the powerful tools they need.
This is a hybrid position based out of our New York City or London hubs. Youll report to our director of engineering.
What youll do
Technical
Breadth across disciplines. Bring deep focus to one problem at a time with the breadth to move between SRE DevOps Infrastructure and Platform work over a quarter or two as the leverage shifts. This is not a thrash-every-week role most of the time youre heads-down on one substantial initiative (the on-call posture the release pipeline the multi-region Terraform layout the internal platform surface). Cross-layer fluency is what lets you pick the right next initiative; it isnt a weekly context-switch.
Simplicity / via negativa. Challenge the status quo and remove toil before adding features automate operational tasks and infrastructure management with Python or Go reject tools that dont fit the problem and treat manual on-call work as a defect to be designed out not a status quo to be staffed up.
Breadth across the stack. Design scalable fault-tolerant infrastructure across AWS (preferred) GCP and Azure working fluently across Kubernetes Helm Terraform and the supporting cloud and AI tooling that backs WRITERs high-traffic platform.
AI in workflow. Run agents in your daily loop Claude Code Droid Codex internal skills to investigate incidents draft Terraform / Helm changes write runbooks scaffold tooling and review PRs. Build the agentic setup as a collective surface: humans and digital teammates working as one team with shared skills shared context and shared on-call workflows. Encode recurring infra tasks as internal skills any teammate (human or agent) can pick up and run so the teams throughput compounds not just your own.
Debugging fluency. Lead incident response post-mortems and root-cause analyses trace failures to the underlying problem (never the symptom) apply the learning back into the architecture and prevent the same incident from happening twice.
Non-technical
End-to-end ownership. Own the reliability performance and efficiency of WRITERs core services end-to-end define and uphold the SLOs and error budgets carry the on-call pager and stand behind the outcome metric not just the system you shipped.
Strategic vs. tactical balance. Balance this weeks critical work with the 612-month platform direction ship the on-call-driving fix today while shaping the multi-year observability cost and reliability investments that move WRITERs enterprise customers.
Cross-functional collaboration. Operate at the seams with product security and engineering peers provide expert guidance on system design for reliability performance and scalability from conception through launch Connect the infra agenda to product and revenue context and disagree with evidence not volume.
What you need
Technical
Track record. 5 years of experience in infrastructure engineering DevOps or a similar role focused on building and operating large-scale high-availability production systems at a high-growth product company.
Breadth. Experience running containerisation in production (a real cluster not a lab) with experience in Helm and Terraform or Pulumi on at least one major cloud (AWS preferred) plus good proficiency in Python or Go for automation and tooling.
AI in workflow. AI is part of how you ship not a thing youve read about agentic tooling (Claude Code Droid Codex internal skills) is in your daily loop youve built or adopted AI-assisted workflows others now use and you have strong opinions on where its unreliable. This is a hard requirement not a bonus. Candidates whose actual daily workflow does not already include AI tooling will not be advanced.
First-principles decision-making. Demonstrated ability to Challenge the status quo proactively identify systemic weaknesses and propose innovative solutions to complex reliability problems reason from constraints and failure modes (not analogy or vendor defaults) name the tradeoff in business terms (reliability vs. velocity cost vs. blast radius standardisation vs. one-off) and reject the best practices answer when it doesnt fit the problem.
Reversibility & blast-radius. Make reversible calls by default write the rollback before you touch production work fluently with monitoring and logging stacks (Prometheus Grafana ELK or equivalent) and stress the system in safe places so it comes back stronger.
Non-technical
Cross-functional collaboration. Excellent communication collaboration and problem-solving skills with a talent for building strong relationships and Connecting with cross-functional teams surface non-goals before anyone asks and partner with product security and platform peers as one delivery surface.
Autonomy & end-to-end ownership. A strong sense of ownership and accountability eager to Own mission-critical systems and drive them toward peak performance and unparalleled reliability. At least one 0-to-1 infrastructure build you owned end-to-end with the outcome metric attached.
Bonus if you have
Software-engineering depth. A software-engineering background not only config and scripting youve designed built and shipped non-trivial production code (services libraries internal frameworks) in Python Go or a comparable language you can read and modify the codebases your infrastructure runs and you move between infra automation and feature engineering without changing brains.
Benefits & perks (UK full-time employees):
Generous PTO plus company holidays
Comprehensive medical and dental insurance
Paid parental leave for all parents (16 weeks)
Fertility and family planning support
Early-detection cancer testing through Galleri
Competitive pension scheme and company contribution
Annual work-life stipends for:
Wellness stipend for gym massage/chiropractor personal training etc.
Learning and development stipend
Company-wide off-sites and team off-sites
Competitive compensation and company stock options
Required Experience:
IC
About Company
Eliminate silos with an end-to-end agent builder platform, designed for collaboration—without compromise. Build, activate, and supervise agents.