Senior SRE Engineer – Cloud Operations

California, CA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Senior SRE Engineer Cloud Operations

Remote Americas
Full-time

We are recruiting on behalf of a fast-growing AI infrastructure company that builds a high-performance vector database powering semantic search RAG pipelines AI agents and large-scale machine learning applications.

We are seeking a Senior Site Reliability Engineer (SRE) to join the Cloud Operations team and help ensure reliability observability and operational excellence across production cloud environments.

This role is highly operations-focused and ideal for engineers who enjoy owning system reliability improving automation and operating large-scale distributed systems in production.

About the Role

As a Senior SRE you will be responsible for maintaining and improving production infrastructure while reducing operational risk and improving system reliability at scale.

You will work closely with platform engineering and infrastructure teams to ensure systems remain secure performant and highly available as customer usage grows.

Location Requirements

Remote Americas (North Central or South America)
Candidates must be able to work primarily within American time zones

Key Responsibilities

Cloud Infrastructure & Operations

Operate and maintain production cloud infrastructure at scale
Manage Kubernetes clusters networking and deployment pipelines
Improve reliability performance and security of production systems

Monitoring & Observability

Enhance monitoring logging and alerting systems
Improve operational visibility and incident detection

Incident Response & Reliability

Lead incident response and root cause analysis
Implement preventive measures and continuous reliability improvements
Participate in on-call rotations

Automation & Process Improvement

Reduce operational toil through automation and tooling
Maintain and improve runbooks and operational procedures

Collaboration

Work closely with platform engineering and infrastructure teams
Support scalable architecture and operational best practices

Requirements

5 years of experience in DevOps SRE or infrastructure operations
Strong hands-on experience running Kubernetes in production
Solid understanding of:
- Linux systems
- Networking fundamentals
- Cloud infrastructure (AWS GCP or Azure)
Experience with monitoring alerting and incident management
Experience with infrastructure automation or infrastructure-as-code
Comfortable participating in on-call rotations
Strong communication and problem-solving skills

Preferred Qualifications

Experience with Terraform or similar IaC tools
Familiarity with Prometheus Grafana Loki or OpenTelemetry
Scripting experience in Python Bash or Go
Experience in SaaS cloud platforms or data infrastructure environments
Exposure to security compliance or system hardening

Whats Offered

Competitive compensation and benefits
Fully remote work environment
Flexible working hours
Opportunity to work on mission-critical cloud infrastructure
Collaborative engineering-driven culture

How to Apply

If you are passionate about reliability engineering cloud infrastructure and large-scale distributed systems we would love to hear from you.

Senior SRE Engineer Cloud Operations Remote Americas Full-time We are recruiting on behalf of a fast-growing AI infrastructure company that builds a high-performance vector database powering semantic search RAG pipelines AI agents and large-scale machine learning applications. We are seeking a Sen...

Senior SRE Engineer Cloud Operations

Remote Americas
Full-time

This role is highly operations-focused and ideal for engineers who enjoy owning system reliability improving automation and operating large-scale distributed systems in production.

About the Role

As a Senior SRE you will be responsible for maintaining and improving production infrastructure while reducing operational risk and improving system reliability at scale.

You will work closely with platform engineering and infrastructure teams to ensure systems remain secure performant and highly available as customer usage grows.

Location Requirements

Remote Americas (North Central or South America)
Candidates must be able to work primarily within American time zones

Key Responsibilities

Cloud Infrastructure & Operations

Operate and maintain production cloud infrastructure at scale
Manage Kubernetes clusters networking and deployment pipelines
Improve reliability performance and security of production systems

Monitoring & Observability

Enhance monitoring logging and alerting systems
Improve operational visibility and incident detection

Incident Response & Reliability

Lead incident response and root cause analysis
Implement preventive measures and continuous reliability improvements
Participate in on-call rotations

Automation & Process Improvement

Reduce operational toil through automation and tooling
Maintain and improve runbooks and operational procedures

Collaboration

Work closely with platform engineering and infrastructure teams
Support scalable architecture and operational best practices

Requirements

5 years of experience in DevOps SRE or infrastructure operations
Strong hands-on experience running Kubernetes in production
Solid understanding of:
- Linux systems
- Networking fundamentals
- Cloud infrastructure (AWS GCP or Azure)
Experience with monitoring alerting and incident management
Experience with infrastructure automation or infrastructure-as-code
Comfortable participating in on-call rotations
Strong communication and problem-solving skills

Preferred Qualifications

Experience with Terraform or similar IaC tools
Familiarity with Prometheus Grafana Loki or OpenTelemetry
Scripting experience in Python Bash or Go
Experience in SaaS cloud platforms or data infrastructure environments
Exposure to security compliance or system hardening

Whats Offered

Competitive compensation and benefits
Fully remote work environment
Flexible working hours
Opportunity to work on mission-critical cloud infrastructure
Collaborative engineering-driven culture

How to Apply

If you are passionate about reliability engineering cloud infrastructure and large-scale distributed systems we would love to hear from you.

Key Skills

Apply Now

About Company

Core Talent Finder

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Senior SRE Engineer – Cloud Operations

California, CA - USA

Job Summary

Senior SRE Engineer Cloud Operations

About the Role

Location Requirements

Key Responsibilities

Requirements

Preferred Qualifications

Whats Offered

How to Apply

Senior SRE Engineer Cloud Operations

About the Role

Location Requirements

Key Responsibilities

Requirements

Preferred Qualifications

Whats Offered

How to Apply

Key Skills

About Company

Related Jobs