Senior Site Reliability Engineer DevOps Engineer

Prophet Town

Job Location:

Mountain View, CA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Senior Site Reliability Engineer (SRE) / DevOps Engineer

Location: Onsite - Mountain View CA

Experience Required: 5 years

Infrastructure Footprint: Global production infrastructure across AWS South America and Europe

Role Type: Hands-on engineering role

Role Overview

Seeking a Senior Site Reliability Engineer / DevOps Engineer to design scale and operate highly available global infrastructure supporting production systems across multiple international regions.

This role is for an engineer with 5 years of experience building and running production-grade cloud infrastructure. The right person understands where distributed systems fail and has learned the hard lessons that come from operating Kubernetes and cloud platforms at scale.

The ideal candidate has deep hands-on experience with Kubernetes ArgoCD Terraform CI/CD pipelines AWS infrastructure and multi-region platform reliability. They should understand the limitations sharp edges and operational failure modes of these tools.

This is an onsite role working closely with platform engineering and leadership to build resilient global infrastructure.

What Youll Do

Global Infrastructure Architecture

Design and operate globally distributed production infrastructure across AWS regions and physical data center environments in South America and Europe
Build highly available multi-region systems with strong disaster recovery and failover strategies
Solve cross-region networking latency DNS routing replication and reliability challenges

Kubernetes Platform Engineering

Build scale secure and troubleshoot production Kubernetes clusters
Handle cluster lifecycle management upgrades node failures networking issues storage problems and control-plane troubleshooting
Tune workloads for resiliency scheduling efficiency autoscaling behavior and resource optimization
Debug real-world Kubernetes issues including:
- etcd instability
- networking overlays and CNI failures
- ingress/controller edge cases
- persistent volume failures
- node pressure and eviction behavior
- cluster upgrade regressions

GitOps / ArgoCD Operations

Design and maintain GitOps workflows using ArgoCD
Manage promotion pipelines across environments and regions
Resolve drift detection issues sync conflicts reconciliation failures and deployment ordering challenges
Build safe rollback and progressive deployment strategies

Candidates should know why ArgoCD breaks not just how to click Sync.

Infrastructure as Code

Build and maintain reusable Terraform modules for multi-region infrastructure
Manage state strategy workspace isolation secrets handling and provider complexity
Solve real-world Terraform pain points including:
- state corruption and locking conflicts
- module version drift
- provider upgrade regressions
- dependency graph surprises
- cross-account provisioning complexity

CI/CD Engineering

Build and optimize production CI/CD pipelines
Improve deployment speed safety and repeatability
Troubleshoot flaky pipelines artifact inconsistencies race conditions environment drift and rollback failures

Reliability & Observability

Establish SLIs/SLOs and production health standards
Build alerting monitoring tracing and incident response workflows
Lead root cause analysis and postmortem improvements
Reduce operational toil through automation

Why This Role

Youll own foundational infrastructure decisions for globally distributed systems and help build resilient platform capabilities at international scale.

This is a hands-on engineering role for someone who wants meaningful ownership and complex technical problems.

Requirements

Required Experience

5 years in Site Reliability Engineering DevOps or Platform Engineering
Deep production experience with:

Kubernetes
ArgoCD
Terraform
AWS
CI/CD systems
Linux systems administration
Infrastructure automation

Preferred Experience

Experience operating infrastructure across multiple continents
Experience with hybrid cloud or physical data center integration
Strong networking knowledge including BGP VPNs routing DNS and load balancing
Experience with security hardening and compliance in production systems
Software engineering background with Go Python or Bash

What Senior Means Here

You have enough production experience to have strong opinions because you have seen failures firsthand.

You know:

why Terraform plans sometimes lie
why ArgoCD syncs can fail for non-obvious reasons
why Kubernetes upgrades can ruin your week
why works in staging means very little
why multi-region failover diagrams often fail in production
why observability usually breaks exactly when needed most

Youve solved these problems repeatedly and improved systems because of those lessons.

Senior Site Reliability Engineer (SRE) / DevOps EngineerLocation: Onsite - Mountain View CAExperience Required: 5 yearsInfrastructure Footprint: Global production infrastructure across AWS South America and EuropeRole Type: Hands-on engineering roleRole OverviewSeeking a Senior Site Reliability Engi...

Senior Site Reliability Engineer (SRE) / DevOps Engineer

Location: Onsite - Mountain View CA

Experience Required: 5 years

Infrastructure Footprint: Global production infrastructure across AWS South America and Europe

Role Type: Hands-on engineering role

Role Overview

Seeking a Senior Site Reliability Engineer / DevOps Engineer to design scale and operate highly available global infrastructure supporting production systems across multiple international regions.

This is an onsite role working closely with platform engineering and leadership to build resilient global infrastructure.

What Youll Do

Global Infrastructure Architecture

Design and operate globally distributed production infrastructure across AWS regions and physical data center environments in South America and Europe
Build highly available multi-region systems with strong disaster recovery and failover strategies
Solve cross-region networking latency DNS routing replication and reliability challenges

Kubernetes Platform Engineering

Build scale secure and troubleshoot production Kubernetes clusters
Handle cluster lifecycle management upgrades node failures networking issues storage problems and control-plane troubleshooting
Tune workloads for resiliency scheduling efficiency autoscaling behavior and resource optimization
Debug real-world Kubernetes issues including:
- etcd instability
- networking overlays and CNI failures
- ingress/controller edge cases
- persistent volume failures
- node pressure and eviction behavior
- cluster upgrade regressions

GitOps / ArgoCD Operations

Design and maintain GitOps workflows using ArgoCD
Manage promotion pipelines across environments and regions
Resolve drift detection issues sync conflicts reconciliation failures and deployment ordering challenges
Build safe rollback and progressive deployment strategies

Candidates should know why ArgoCD breaks not just how to click Sync.

Infrastructure as Code

Build and maintain reusable Terraform modules for multi-region infrastructure
Manage state strategy workspace isolation secrets handling and provider complexity
Solve real-world Terraform pain points including:
- state corruption and locking conflicts
- module version drift
- provider upgrade regressions
- dependency graph surprises
- cross-account provisioning complexity

CI/CD Engineering

Build and optimize production CI/CD pipelines
Improve deployment speed safety and repeatability
Troubleshoot flaky pipelines artifact inconsistencies race conditions environment drift and rollback failures

Reliability & Observability

Establish SLIs/SLOs and production health standards
Build alerting monitoring tracing and incident response workflows
Lead root cause analysis and postmortem improvements
Reduce operational toil through automation

Why This Role

Youll own foundational infrastructure decisions for globally distributed systems and help build resilient platform capabilities at international scale.

This is a hands-on engineering role for someone who wants meaningful ownership and complex technical problems.

Requirements

Required Experience

5 years in Site Reliability Engineering DevOps or Platform Engineering
Deep production experience with:

Kubernetes
ArgoCD
Terraform
AWS
CI/CD systems
Linux systems administration
Infrastructure automation

Preferred Experience

Experience operating infrastructure across multiple continents
Experience with hybrid cloud or physical data center integration
Strong networking knowledge including BGP VPNs routing DNS and load balancing
Experience with security hardening and compliance in production systems
Software engineering background with Go Python or Bash

What Senior Means Here

You have enough production experience to have strong opinions because you have seen failures firsthand.

You know:

why Terraform plans sometimes lie
why ArgoCD syncs can fail for non-obvious reasons
why Kubernetes upgrades can ruin your week
why works in staging means very little
why multi-region failover diagrams often fail in production
why observability usually breaks exactly when needed most

Youve solved these problems repeatedly and improved systems because of those lessons.

Apply Now

About Company

Prophet Town

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Senior Site Reliability Engineer DevOps Engineer

Mountain View, CA - USA

Job Summary

Senior Site Reliability Engineer (SRE) / DevOps Engineer

Role Overview

What Youll Do

Global Infrastructure Architecture

Kubernetes Platform Engineering

GitOps / ArgoCD Operations

Infrastructure as Code

CI/CD Engineering

Reliability & Observability

Why This Role

Requirements

Required Experience

Preferred Experience

What Senior Means Here

Senior Site Reliability Engineer (SRE) / DevOps Engineer

Role Overview

What Youll Do

Global Infrastructure Architecture

Kubernetes Platform Engineering

GitOps / ArgoCD Operations

Infrastructure as Code

CI/CD Engineering

Reliability & Observability

Why This Role

Requirements

Required Experience

Preferred Experience

What Senior Means Here

About Company

Related Jobs