Forward Deployed Engineer, RL Environments

Labelbox

Job Location:

San Francisco, CA - USA

Monthly Salary: $ 140000 - 200000

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Shape the Future of AI

At Labelbox were building the critical infrastructure that powers breakthrough AI models at leading research labs and enterprises. Since 2018 weve been pioneering data-centric approaches that are fundamental to AI development and our work becomes even more essential as AI capabilities expand exponentially.

About Labelbox

Were the only company offering three integrated solutions for frontier AI development:

Enterprise Platform & Tools: Advanced annotation tools workflow automation and quality control systems that enable teams to produce high-quality training data at scale
Frontier Data Labeling Service: Specialized data labeling through Alignerr leveraging subject matter experts for next-generation AI models
Expert Marketplace: Connecting AI teams with highly skilled annotators and domain experts for flexible scaling

Why Join Us

High-Impact Environment: We operate like an early-stage startup focusing on impact over process. Youll take on expanded responsibilities quickly with career growth directly tied to your contributions.
Technical Excellence: Work at the cutting edge of AI development collaborating with industry leaders and shaping the future of artificial intelligence.
Innovation at Speed: We celebrate those who take ownership move fast and deliver impact. Our environment rewards high agency and rapid execution.
Continuous Growth: Every role requires continuous learning and evolution. Youll be surrounded by curious minds solving complex problems at the frontier of AI.
Clear Ownership: Youll know exactly what youre responsible for and have the autonomy to execute. We empower people to drive results through clear ownership and metrics.

The Role

Were hiring a Forward Deployed Engineer to own the design development and operationalization of reinforcement learning environments. Youll build the sandboxed reproducible execution environments that AI agents interact with during training and evaluationthings like terminal-based task benchmarks browser and computer-use environments and tool-augmented agentic workspaces.

This is a hands-on engineering role. Youll write production-quality infrastructure code integrate with open-source RL tooling and work closely with our data operations team to ensure environments are robust observable and ready for human annotators and model agents alike. You wont be doing ML research but youll need to deeply understand how RL training loops consume environments and where the bottlenecks live.

What Youll Do

Design build and maintain sandboxed RL environments for agentic AI trainingincluding terminal emulators browser automation harnesses computer-use simulators and tool-augmented workspaces (e.g. environments built on frameworks like TerminalBench OSWorld and Tau-bench)
Develop reproducible containerized execution environments (Docker VMs lightweight sandboxes) that support deterministic task rollouts and reward signal collection
Integrate with and extend open-source agentic tooling and custom CLI/API harnesses to enable multi-step agent interaction
Build instrumentation and observability layersstructured logging trajectory capture state snapshottingso training runs and human annotation sessions produce clean auditable data
Collaborate with data operations to design task curricula and evaluation protocols that stress-test model capabilities across environment types
Own environment deployment and reliability: CI/CD pipelines automated testing of environment configurations and monitoring for drift or breakage across versions
Rapidly prototype new environment types as client and internal requirements evolve moving from spec to working system in days not weeks

What Were Looking For

Required

2 years of professional software engineering experience with strong fundamentals in Python and at least one systems-level language (Go Rust C)
Demonstrated experience with containerization and sandboxing (Docker Podman Firecracker or similar) in production or near-production contexts
Familiarity with RL concepts: MDPs reward shaping episode structure observation/action spaces. You dont need to have trained models but you need to understand what an environment must provide to an RL training loop
Experience building or maintaining developer tooling CLI tools or infrastructure automation
Comfort working with browser automation frameworks or terminal interaction tooling
Strong debugging instinctsyou can trace failures across process boundaries container layers and network calls
Ability to read and implement from academic papers and open-source benchmark repositories without extensive hand-holding

Preferred

Direct experience building or contributing to RL environments (Gymnasium/Gym PettingZoo or custom environment implementations)
Experience with agentic AI evaluation frameworks (SWE-bench WebArena OSWorld TerminalBench or similar)
Familiarity with GCP or AWS infrastructure (Compute Engine ECS/EKS Cloud Build)
Prior work at an AI data company ML platform company or AI research lab
Contributions to open-source projects in the RL agents or dev-tools space

Candidate Archetype

The ideal candidate is a strong software engineer first with genuine curiosity and working knowledge of reinforcement learning. Youve probably built infrastructure or developer tooling at a startup or mid-stage company and youve been pulled toward the ML/AI spacemaybe through side projects open-source contributions or a prior role adjacent to an ML team. Youre the kind of engineer who reads an RL benchmark paper and immediately thinks about how to make the environment more robust not how to improve the policy gradient.

You thrive in ambiguity. You can take a loosely defined project requirementbuild an environment that tests an agents ability to navigate a file system and execute multi-step bash workflowsand deliver a working tested documented system without needing a detailed spec. You move fast but you care about reliability because you know environments that break silently poison training data.

Why This Role Matters

RL environment quality is one of the biggest bottlenecks in agentic AI training today. Environments that are brittle non-deterministic or poorly instrumented produce noisy reward signals that directly degrade model performance. Youll be solving one of the highest-leverage infrastructure problems in AI.
Youll work across a portfolio of projects spanning different AI labs and model capabilitiesno single-product monotony. The environment types you build will evolve as the frontier of agent capabilities moves.
Alignerr is a small high-impact team inside a well-funded company (Labelbox). Youll have startup-level ownership with growth-stage resources.

_{Alignerr Services at Labelbox}

Alignerr is Labelboxs human data organization purpose-built to generate the high-quality training data that powers the next generation of AI models. We partner directly with leading AI labs to produce reinforcement learning environments evaluation benchmarks and expert-annotated datasets that push model capabilities forward. Our team sits at the intersection of software engineering ML infrastructure and human-in-the-loop data production.

Labelbox strives to ensure pay parity across the organization and discuss compensation transparently. The expected annual base salary range for United States-based candidatesis below. This range is not inclusive of any potential equity packages or additional benefits. Exact compensation varies based on a variety of factors including skills and competencies experience and geographical location.

Annual base salary range

$140000 - $200000 USD

Life at Labelbox

Location: Join our dedicated tech hubs in San Francisco or Wrocław Poland
Work Style: Hybrid model with 2 days per week in office combining collaboration and flexibility
Environment: Fast-paced and high-intensity perfect for ambitious individuals who thrive on ownership and quick decision-making
Growth: Career advancement opportunities directly tied to your impact
Vision: Be part of building the foundation for humanitys most transformative technology

Our Vision

We believe data will remain crucial in achieving artificial general intelligence. As AI models become more sophisticated the need for high-quality specialized training data will only grow. Join us in developing new products and services that enable the next generation of AI breakthroughs.

Labelbox is backed by leading investors including SoftBank Andreessen Horowitz B Capital Gradient Ventures Databricks Ventures and Kleiner Perkins. Our customers include Fortune 500 enterprises and leading AI labs.

Your Personal Data Privacy: Any personal information you provide Labelbox as a part of your application will be processed in accordance with Labelboxs Job Applicant Privacy notice.

Any emails from Labelbox team members will originate from a @ email address. If you encounter anything that raises suspicions during your interactions we encourage you to exercise caution and suspend or discontinue communications.

Required Experience:

Shape the Future of AIAt Labelbox were building the critical infrastructure that powers breakthrough AI models at leading research labs and enterprises. Since 2018 weve been pioneering data-centric approaches that are fundamental to AI development and our work becomes even more essential as AI capab...

Shape the Future of AI

About Labelbox

Were the only company offering three integrated solutions for frontier AI development:

Enterprise Platform & Tools: Advanced annotation tools workflow automation and quality control systems that enable teams to produce high-quality training data at scale
Frontier Data Labeling Service: Specialized data labeling through Alignerr leveraging subject matter experts for next-generation AI models
Expert Marketplace: Connecting AI teams with highly skilled annotators and domain experts for flexible scaling

Why Join Us

High-Impact Environment: We operate like an early-stage startup focusing on impact over process. Youll take on expanded responsibilities quickly with career growth directly tied to your contributions.
Technical Excellence: Work at the cutting edge of AI development collaborating with industry leaders and shaping the future of artificial intelligence.
Innovation at Speed: We celebrate those who take ownership move fast and deliver impact. Our environment rewards high agency and rapid execution.
Continuous Growth: Every role requires continuous learning and evolution. Youll be surrounded by curious minds solving complex problems at the frontier of AI.
Clear Ownership: Youll know exactly what youre responsible for and have the autonomy to execute. We empower people to drive results through clear ownership and metrics.

The Role

What Youll Do

Design build and maintain sandboxed RL environments for agentic AI trainingincluding terminal emulators browser automation harnesses computer-use simulators and tool-augmented workspaces (e.g. environments built on frameworks like TerminalBench OSWorld and Tau-bench)
Develop reproducible containerized execution environments (Docker VMs lightweight sandboxes) that support deterministic task rollouts and reward signal collection
Integrate with and extend open-source agentic tooling and custom CLI/API harnesses to enable multi-step agent interaction
Build instrumentation and observability layersstructured logging trajectory capture state snapshottingso training runs and human annotation sessions produce clean auditable data
Collaborate with data operations to design task curricula and evaluation protocols that stress-test model capabilities across environment types
Own environment deployment and reliability: CI/CD pipelines automated testing of environment configurations and monitoring for drift or breakage across versions
Rapidly prototype new environment types as client and internal requirements evolve moving from spec to working system in days not weeks

What Were Looking For

Required

2 years of professional software engineering experience with strong fundamentals in Python and at least one systems-level language (Go Rust C)
Demonstrated experience with containerization and sandboxing (Docker Podman Firecracker or similar) in production or near-production contexts
Familiarity with RL concepts: MDPs reward shaping episode structure observation/action spaces. You dont need to have trained models but you need to understand what an environment must provide to an RL training loop
Experience building or maintaining developer tooling CLI tools or infrastructure automation
Comfort working with browser automation frameworks or terminal interaction tooling
Strong debugging instinctsyou can trace failures across process boundaries container layers and network calls
Ability to read and implement from academic papers and open-source benchmark repositories without extensive hand-holding

Preferred

Direct experience building or contributing to RL environments (Gymnasium/Gym PettingZoo or custom environment implementations)
Experience with agentic AI evaluation frameworks (SWE-bench WebArena OSWorld TerminalBench or similar)
Familiarity with GCP or AWS infrastructure (Compute Engine ECS/EKS Cloud Build)
Prior work at an AI data company ML platform company or AI research lab
Contributions to open-source projects in the RL agents or dev-tools space

Candidate Archetype

Why This Role Matters

RL environment quality is one of the biggest bottlenecks in agentic AI training today. Environments that are brittle non-deterministic or poorly instrumented produce noisy reward signals that directly degrade model performance. Youll be solving one of the highest-leverage infrastructure problems in AI.
Youll work across a portfolio of projects spanning different AI labs and model capabilitiesno single-product monotony. The environment types you build will evolve as the frontier of agent capabilities moves.
Alignerr is a small high-impact team inside a well-funded company (Labelbox). Youll have startup-level ownership with growth-stage resources.

_{Alignerr Services at Labelbox}

Annual base salary range

$140000 - $200000 USD

Life at Labelbox

Location: Join our dedicated tech hubs in San Francisco or Wrocław Poland
Work Style: Hybrid model with 2 days per week in office combining collaboration and flexibility
Environment: Fast-paced and high-intensity perfect for ambitious individuals who thrive on ownership and quick decision-making
Growth: Career advancement opportunities directly tied to your impact
Vision: Be part of building the foundation for humanitys most transformative technology

Our Vision

Your Personal Data Privacy: Any personal information you provide Labelbox as a part of your application will be processed in accordance with Labelboxs Job Applicant Privacy notice.

Required Experience:

Apply Now

About Company

Labelbox

Denean Kelson, ergonomics consultant and productivity software enthusiast

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Forward Deployed Engineer, RL Environments

San Francisco, CA - USA

Job Summary

Shape the Future of AI

About Labelbox

Why Join Us

Alignerr Services at Labelbox

Life at Labelbox

Our Vision

Shape the Future of AI

About Labelbox

Why Join Us

Alignerr Services at Labelbox

Life at Labelbox

Our Vision

About Company

Related Jobs

_{Alignerr Services at Labelbox}

_{Alignerr Services at Labelbox}