drjobs AI Safety Research Intern-1

AI Safety Research Intern-1

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Redmond - USA

Hourly Salary drjobs

$ 30 - 50

Vacancy

1 Vacancy

Job Description

About Centific

Centific is a frontier AI data foundry that curates diverse high-quality data using our purpose-built technology platforms to empower the Magnificent Seven and our enterprise clients with safe scalable AI deployment. Our team includes more than 150 PhDs and data scientists along with more than 4000 AI practitioners and engineers. We harness the power of an integrated solution ecosystemcomprising industry-leading partnerships and 1.8 million vertical domain experts in more than 230 marketsto create contextual multilingual pre-trained datasets; fine-tuned industry-specific LLMs; and RAG pipelines supported by vector databases. Our zero-distance innovation solutions for GenAI can reduce GenAI costs by up to 80% and bring solutions to market 50% faster.

Our mission is to bridge the gap between AI creators and industry leaders by bringing best practices in GenAI to unicorn innovators and enterprise customers. We aim to help these organizations unlock significant business value by deploying GenAI at scale helping to ensure they stay at the forefront of technological advancement and maintain a competitive edge in their respective markets.

About Job

Internship: AI Safety Jailbreaking Attacks & Defense Agentic AI Human Behavior

(Ph.D. Research Intern)

Company: Centific
Location: Seattle WA (or Remote)
Type: Full-time Internship - 40 hours per week

Build the Future of Safe and Responsible AI

Are you advancing the frontiers of AI safety LLM jailbreak detection and defense and agentic AIwith publications to show for it Join us to translate pioneering research into robust security and trustworthy LLM systems that resist adversarial and behavioral exploits.

The Mission

Were tackling cutting-edge AI safety across adversarial robustness jailbreak defense agentic workflows and human-in-the-loop risk modeling. As a Ph.D. Research Intern youll own high-impact experiments from concept to prototype to deployable modules directly contributing to our platforms security guarantees.

What Youll Do

  • Advance AI Safety: Design implement and evaluate attack and defense strategies for LLM jailbreaks (prompt injection obfuscation narrative red teaming).
  • Evaluate AI Behavior: Analyze and simulate human-AI interaction patterns to uncover behavioral vulnerabilities social engineering risks and over-defensive vs. permissive response tradeoffs.
  • Agentic AI Security: Prototype workflows for multi-agent safety (e.g. agent self-checks regulatory compliance defense chains) that span perception reasoning and action.
  • Benchmark & Harden LLMs: Create reproducible evaluation protocols/KPIs for safety over-defensiveness adversarial resilience and defense effectiveness across diverse models (including latest benchmarks and real-world exploit scenarios).
  • Deploy and Monitor: Package research into robust monitorable AI services using modern stacks (Kubernetes Docker Ray FastAPI); integrate safety telemetry anomaly detection and continuous red-teaming.

Example Problems You Might Tackle

  • Jailbreaking Analysis: Systematically red-team advanced LLMs (GPT-4o GPT-5 LLaMA Mistral Gemma etc.) uncovering novel exploits and defense gaps.
  • Multi-turn Obfuscation Defense: Implement context-aware multi-turn attack detection and guardrail mechanisms including countermeasures for obfuscated prompts (e.g. StringJoin narrative exploits).
  • Agent Self-Regulation: Develop agentic architectures for autonomous self-check and self-correct minimizing risk in complex multi-agent environments.
  • Human-Centered Safety: Study human behavior models in adversarial contextshow users probe trick or manipulate LLMs and how defenses can adapt without excessive over-defensiveness.

Minimum Qualifications

  • Ph.D. student in CS/EE/ML/Security (or related); actively publishing in AI Safety NLP robustness or adversarial ML (ACL NeurIPS BlackHat IEEE S&P etc.).
  • Strong Python and PyTorch/JAX skills; comfort with toolkits for language models benchmarking and simulation.
  • Demonstrated research in at least one of: LLM jailbreak attacks/defense agentic AI safety human-AI interaction vulnerabilities.
  • Proven ability to go from concept code experiment result with rigorous tracking and ablation studies.

Preferred Qualifications

  • Experience in adversarial prompt engineering jailbreak detection (narrative obfuscated sequential attacks).
  • Prior work on multi-agent architectures or robust defense strategies for LLMs.
  • Familiarity with red-teaming synthetic behavioral data and regulatory safety standards.
  • Scalable training and deployment: Ray distributed evaluation CI/telemetry for defense protocols.
  • Public code artifacts (GitHub) and first-author publications or strong open-source impact.

Our Stack (youll touch a subset)

  • Modeling: PyTorch/JAX Hugging Face OpenMMLab Mistral LLaMA
  • Safety: Red-teaming frameworks LLM benchmarking (SODE ART) human behavior simulation
  • Systems: Python Ray Kubernetes Docker FastAPI Triton Weights & Biases
  • Defense Pipelines: Context-aware filtering prompt manipulation detection anomaly telemetry

What Success Looks Like

  • A publishable outcome (with company approval) or production-ready module measurably improving safety KPIs: adversarial robustness over-defensiveness and incident response latency.
  • Clean reproducible code with documented ablations and end-to-end rerun reports for safety benchmarks.
  • A demo that communicates capabilities limits and next steps in defense and security assurance.

Why Centific

  • Real Impact: Your research shipsdirectly securing our core features and AI infrastructure.
  • Mentorship: Collaborate with Principal Architects and senior researchers in AI safety and adversarial ML.
  • Velocity Rigor: Balance high-quality research with mission-critical product focus.
$30-50 per hour

How to Apply

Email your CV publication list/Google Scholar and GitHub (or code artifacts/videos) to with the subject line:

Centific is an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race color religion national origin ancestry citizenship status age mental or physical disability medical condition sex (including pregnancy) gender identity or expression sexual orientation marital status familial status veteran status or any other characteristic protected by applicable law. We consider qualified applicants regardless of criminal histories consistent with legal requirements.


Required Experience:

Intern

Employment Type

Full-Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.