Since our founding in July 2022 weve grown quickly to 30 staff producing over 40 influential academic papers and establishing leading AI Safety events. Our work is recognized globally with publications at premier venues such as NeurIPS ICML and ICLR and features in the Financial Times Nature News and MIT Technology Review.
We drive practical change through red-teaming with frontier model developers and government institutes. Most recently we discovered major issues with Anthropics latest model the same day it was released and worked with OpenAI to safeguard their latest model. Additionally we help steer and grow the AI safety field through developing research roadmaps with renowned researchers such as Yoshua Bengio; running an AI safety-focused co-working space in Berkeley housing 40 members; and supporting the community through targeted grants to technical researchers.
Our research team likes to move fast. We explore promising research directions in AI safety and scale up only those showing a high potential for impact. Unlike other AI safety labs that take a bet on a single research direction aims to pursue a diverse portfolio of projects. Our model is to conduct initial investigations into a range of high-potential areas. We incubate the most promising directions through a combination of in-house research field-building events and targeted grants. Once the core research problems are solved we work to scale them to a minimum viable prototype demonstrating their validity to AI companies and governments to drive adoption.
Our current focus areas include:
Mitigating AI deception: studying when lie detectors induce honesty or evasion and developing model organisms for deception and sandbagging
Evaluating and red-teaming: Conducting pre- and post-release adversarial evaluations of frontier models (e.g. Claude 4 Opus ChatGPT Agent GPT-5); developing novel attacks to support this work; and exploring new threat models (e.g. persuasion tampering risks).
Robustness: working to rigorously solve these security problems through building a science of security and robustness for AI from demonstrating superhuman systems can be vulnerable through to scaling laws for robustness.
Explainability: developing foundational techniques such as codebook features and AC/DC and applying them to understand core safety problems like learned planning.
This role would be a good fit for an experienced machine learning engineer or an experienced software engineer looking to transition to AI safety research. All candidates are expected to:
Have significant software engineering experience. Evidence of this may include prior work experience and open-source contributions.
Be fluent working in Python.
Be results-oriented and motivated by impactful research.
Bring prior experience mentoring other engineers or scientists in engineering skills.
Additionally candidates are expected to bring expertise in one of the following areas corresponding to the core competencies our different research teams most need:
Option 1 Machine Learning:
Substantial experience training transformers with common ML frameworks like PyTorch or jax.
Good knowledge of basic linear algebra calculus vector probability and statistics.
Option 2 High-Performance Computing:
Power user of cluster orchestrators such as Kubernetes (preferred) or SLURM
Experience building high-performance distributed-systems (e.g. multi-node training large-scale numerical computation)
Experience optimizing and profiling code (ideally including on GPU e.g. CUDA kernels).
Option 3 Technical Leadership:
Experience designing large-scale software systems whether as an architect in greenfield software development or leading a major refactor.
Comfortable project managing small teams such as chairing stand-ups and developing detailed roadmaps to execute on a 3-6 month research vision.
As a Member of Technical Staff (Senior Research Engineer) you would join one of our existing workstreams and lead projects there:
Detecting and preventing deception. Under what conditions can we reliably detect deceptive behaviour from models and can such behaviour be effectively mitigated at scale This would focus on large-scale training of transformers.
Preventing catastrophic misuse. Apply our research insights to detect and mitigate vulnerabilities and other risks in frontier AI models. This would focus more on technical leadership
Accelerating our research. Build frameworks and infrastructure that allows us to ask bigger questions and more rapidly run new experiments to deepen our research. This would focus more on high-performance computing.
As we continue to grow our research portfolio additional workstreams may open up for contribution for example in mechanistic interpretability.
If based in the USA you will be an employee of a 501(c)(3) research non-profit. Outside the USA you will be an employee of an EoR organization on behalf of .
Location: Both remote (global) and in-person (Berkeley CA) are possible. We sponsor visas for in-person employees and can also hire remotely in most countries and time zones provided you are willing to overlap for 2h of the Berkeley working day.
Hours: Full-time (40 hours/week).
Compensation: $150000-$250000/year depending on experience and location with the potential for additional compensation for exceptional candidates. We will also pay for work-related travel and equipment expenses. We offer catered lunch and dinner at our offices in Berkeley.
Application process: A 72-minute programming assessment a short screening call two 1-hour interviews and a 1-2 week paid work trial. If you are not available for a work trial we may be able to find alternative ways of testing your fit.
If you have any questions about the role please do get in touch at .
Required Experience:
Senior IC