drjobs Site Reliability Engineer

Site Reliability Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

San Francisco, CA - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Ready to shape the future of AI infrastructure and build systems that power the most advanced unstructured data pipelines in the world
At Unstructured were building the backbone of generative AIenabling companies to transform PDFs HTML Word docs images and more into high-performance data pipelines that scale. Our tools are already used by half of the Fortune 500 and our open-source package has been downloaded 26 million times. Now were entering our next chapterand were hiring a Site Reliability Engineer to help scale our systems and safeguard our infrastructure.

If youre energized by reliability love solving infrastructure challenges at scale and want to help define how modern AI systems run in production this is your moment. Youll work closely with Engineering Product and Customer teams to build scalable systems streamline CI/CD and make reliability a first-class citizen across everything we deploy.

This role is hybrid in San Franciscojoin us in-office 3x a week for deep collaboration whiteboard sessions and hands-on impact.

What Youll Own & Drive

Scale & Stability at the Core
Design and implement highly available observable and scalable infrastructure across cloud environments
Build resilient systems that meet the demands of enterprise-grade production AI workloads

Automate Everything
Develop Infrastructure-as-Code using Terraform Pulumi and others
Own CI/CD automation and build reusable pipelines with GitHub Actions and modern DevOps tooling

Own Kubernetes & Orchestration
Manage and optimize our Kubernetes clusters and containerized environments
Tune Helm charts service mesh configs and orchestration systems for performance and security

Obsess Over Observability
Implement and maintain monitoring logging and alerting with tools like Prometheus Grafana Datadog and Elastic
Ensure we can see understand and respond to system behavior in real-time

Drive Production Readiness
Partner with engineering to prepare features and systems for production rollouts
Contribute to capacity planning deployment strategies and fault-tolerant system design

Lead Incident Response
Support and lead incident response processes postmortems and root cause analysis
Champion a culture of blameless retrospectives and continuous improvement

Accelerate Engineering Velocity
Improve developer experience through tooling automation and streamlined feedback loops
Help teams move faster without sacrificing quality or uptime

What You Bring
-4 years in SRE DevOps or Infrastructure Engineering roles supporting high-scale production environments
-Deep experience with cloud platforms like AWS GCP or Azure
-Expertise in Kubernetes Docker and container orchestration at scale
-Strong Linux systems and networking fundamentals
-Scripting and automation skills (Python Bash or Go preferred)
-Proficiency with Infrastructure-as-Code (Terraform Pulumi Ansible or similar)
-Solid understanding of monitoring and observability best practices
-A calm systems-thinking approach to incident response and reliability

Bonus Points
-Experience supporting ML infrastructure or real-time data pipelines
-Exposure to serverless or event-driven architectures
-Contributions to open-source DevOps projects or communities
-Familiarity with security and compliance in cloud-native environments

Why Youll Love It Here
Impact That Matters: Own the core infrastructure behind AI systems used by the Fortune 500
Big Technical Challenges: Solve hard meaningful problems at the cutting edge of cloud and data
Elite Team: Join a sharp humble group of engineers who value execution and impact
SF Office Vibes: Collaborate live with real whiteboards and real humans (not just Slack threads)
Flexible Culture: Hybrid structure with async-friendly low-ego collaboration
$190000 - $250000 a year
This roles salary is benchmarked against San Francisco market rates to remain competitive with top-tier talent in high-cost-of-living regions. Final compensation may vary based on experience skill set and location.

Employment Type

Full-Time

Company Industry

Department / Functional Area

Engineering

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.