Staff AI Engineer
Department:
Job Summary
We believe conversations will become the #1 way to shop.
At Gorgias were building the platform that makes this real: a unified AI agent that sells supports and re-engages customers across the entire journey. Conversational Commerce is the future of ecommerce and were leading that shift.
Our mission is to turn every interaction between a brand and its customers into a relationship: personal seamless and intelligent. By combining deep product expertise with the latest in AI were making shopping feel more natural human and connected than ever before.
To win we focus relentlessly on:
Quality: conversations that feel authentic and on-brand.
Experience: effortless shopping from chat to checkout.
Re-engagement: personal 1-1 dialogue instead of noisy marketing.
The opportunity is massive. As AI reshapes how people buy Gorgias is building the foundation for the next decade of ecommerce where every brand has its own intelligent agent and every customer feels understood.
Join us to make Conversational Commerce real.
Team & Context
Gorgias is an AI-first company building products powered by LLMs and agent-based systems.
As we scale our AI capabilities we need to improve how we evaluate iterate and operate these systems in production. Today parts of this process remain manual or fragmented especially around prompt iteration validation and evaluation workflows.
This role will focus on building and scaling the systems that support AI evaluation and iteration helping the team move faster and more reliably.
About the Role
Youll have a chance to:
Work on production AI systems used by thousands of businesses
Define how we evaluate and improve AI performance at scale
Build internal platforms and tooling used by AI and engineering teams
Reduce manual processes and improve iteration speed on AI features
Collaborate across AI ML and product teams
Raise the engineering bar and mentor others
What Youll Do
1. Architect the Evaluation Factory
End-to-End Platform Ownership: Architect and lead the development of our internal evaluation platform moving the needle from manual testing to a fully automated lifecycle (from LLM-as-a-judge creation to production monitoring).
Accelerate Time-to-Market: Directly impact our primary KPI by designing tools and workflows that drastically reduce the time it takes to deliver a calibrated production-ready agent.
Infrastructure Collaboration: Partner with the Orchestration team to build the robust scalable infrastructure required to run complex evals and agentic simulations at scale.
2. Scaling AI Expertise
Squad Empowerment: Serve as the AI Technical Lead for product squads guiding them through the complexities of agent design failure analysis and prompting best practices.
Decentralize Quality: Instead of being a bottleneck you will build the paved road that allows product squads to become autonomous in measuring and maintaining their own agent quality.
Standard Setting: Define what good looks like for AI at Company Name. Youll translate non-deterministic AI behavior into predictable engineering metrics that the whole organization can trust.
3. Engineering Leadership
Mentor & Level Up: Bridge the gap between traditional software engineering and AI. Youll mentor engineers on how to apply rigorous system design to the world of LLMs and agents.
Continuous Observability: Take ownership of the feedback loop ensuring that production insights from our agents directly inform the next iteration of our evaluation datasets.
Who You Are
8 Years of Engineering Excellence: You are a Staff-level engineer first. Youve built systems that handle high scale and you know how to architect for long-term maintainability and performance.
Agentic Curiosity: Youve moved beyond the chatbot phase and are actively experimenting with AI Agents. You understand that the challenge isnt the prompt but the orchestration state management and reliability of the agents actions.
Systems Thinker (Non-Deterministic Mindset): You recognize that AI is probabilistic. You are excited by the challenge of building deterministic wrappers and Evaluation loops around models to make them safe for production.
The Applied Edge: You likely come from a background in distributed systems internal platforms or developer tooling and youre now applying that rigor to the AI stack.
What Were Looking For
Beyond the Wrapper: You have serious experience moving beyond simple API calls to architecting multi-stage AI orchestrations (agents chained workflows or custom runtime logic).
Orchestration Experience: Even if you arent an AI researcher you have experience building complex multi-step workflows (e.g. temporal systems state machines or event-driven architectures) and want to apply this to Agentic loops.
Reliability Obsession: You understand why vibes-based testing doesnt work. Youve started exploring or building Eval frameworks to measure how models perform against real-world data.
Infrastructure Mindset: You are comfortable with the glue that makes AI work: vector databases semantic caching and API integration with third-party tools.
Tech Stack & Experience
Strong backend experience (Python preferred)
Experience with distributed systems and event-driven architectures
Familiarity with tools like Kafka Pub/Sub or equivalent
Experience working with LLMs (prompting RAG agents evaluation workflows)
Experience building APIs and scalable services
Understanding of monitoring observability and system performance
Hiring Process
Recruiter phone screen
HM Interview
System Design Interview
AI Case Study (take-home 12 hours)
Technical Deep Dive of case study
Final Leadership Interview
Perks and Benefits
Competitive salary & equity (90th percentile worldwide go check our public salary calculator)
5-week vacation plus 2 weeks RTT (We follow each countrys appropriate PTO Laws)
Paid sick leave
Paid parental leave (16 weeks)
MacBook Pro
Personal credit card to buy lunches (we use Swile)
We provide private health insurance (we use GAN)
Get up to 700 to set up your workstation at home (working from home should feel breezy)
Get up to 2000 of learning material and wellness support per year! This includes 1500 for learning material (such as books courses and individual coaching sessions) directly linked to your job scope as well as a 500 wellness budget. Take advantage of these resources to grow in your role and prioritize your personal development and wellness.
Every quarter we organize an online company-wide summit to discuss where were going and strengthen social bonds. Once per year we organize offsite team retreats and company retreats!
AI at Gorgias
At Gorgias AI is a natural extension of how we work and build. Our teams use it every day to research write analyze code and craft better customer experiences. Everyone has access to premium AI tools (ChatGPT Claude Granola & others) and an annual L&D budget to explore new ones.
The real magic happens when we share what we learn. Our #powerup Slack channel is a digital petri dish of new tools and workflows and each team has AI champions who showcase fresh ideas during weekly company-wide standups now practically AI demo sessions.
We see AI not as a replacement for creativity or empathy but as a multiplier helping us move faster think deeper and serve customers better.
AI use in Recruiting at Gorgias
By submitting your application you agree that Gorgias may collect and process your personal data for recruiting workforce planning and related purposes. For more information about how we process your data and your rights please refer to our Applicant Privacy Policy.
Diversity & Inclusion at Gorgias
Were committed to creating an inclusive environment where everyone can thrive. We welcome applicants from all backgrounds experiences and perspectives because diverse teams drive innovation and better decision-making.
If you need accommodations during the application or interview process please contact us at .
Required Experience:
Staff IC
About Company
Gorgias is the leading AI customer experience platform for ecommerce stores. Trusted by over 15,000 merchants worldwide.