Senior DevOps Engineer Site Reliability Engineer

Thomas Talent Network

Job Location:

Raleigh, WV - USA

Monthly Salary: Not Disclosed

Posted on: 3 hours ago

Vacancies: 1 Vacancy

Job Summary

A leading B2B SaaS platform in the cross-border e-commerce sector is expanding its North America operations. Were seeking a Senior DevOps Engineer / Site Reliability Engineer (SRE) to architect and maintain our unified global O&M (operations and maintenance) platform.

This is a newly created role supporting our North America teams contribution. Youll work directly with our Middle Platform Director Technical Experts and CEO in a collaborative remote-first environment.

KEY RESPONSIBILITIES:

Design develop and maintain unified operation and platform management systems covering resource management monitoring & alerting configuration management and automated operation & maintenance

Build and operate observability platforms and CI/CD pipelines; develop self-healing systems and automated incident response processes to realize intelligent O&M

Establish DevOps standards and best practices; promote standardization of DevOps toolchains (technology selection version management)

Provide platform-level technical support for product and engineering teams; resolve complex system issues reduce technical debt and lead infrastructure and architecture upgrades

Promote SRE concepts and engineering practices; organize technical sharing and training; build a reliability engineering system

Conduct technical research and innovation; track cloud-native/DevOps industry trends; evaluate new technologies and drive continuous modernization of O&M platforms

REQUIRED QUALIFICATIONS:

Currently residing in California or North Carolina USA

US Green Card or US Citizenship (work authorization; no sponsorship available)

Fluent in Mandarin Chinese (working language; close collaboration with domestic R&D required)

Bachelors degree or above in Computer Science or related field

4-6 years of hands-on experience in DevOps/SRE/Platform Engineering

Proficient in at least one major cloud platform (AWS/Azure/GCP) with deep understanding of VPC EC2 EKS/K8s RDS IAM

Proficient in Linux networking containers (Docker/Kubernetes) load balancing and service governance

Skilled in IaC (Infrastructure as Code) tools: Terraform Ansible Helm

Experience building CI/CD pipelines: Jenkins Argo CD CodeBuild etc.

Familiar with monitoring/logging/tracing: Prometheus Grafana ELK OpenTelemetry

Proficient in at least one development/scripting language: Python Shell Go

Excellent system design analysis and troubleshooting skills

Strong cross-team communication and collaboration abilities

PREFERRED QUALIFICATIONS:

Masters degree in Computer Science or related field

Experience with global platforms cross-border SRE multi-cloud O&M

Led platform reconstruction self-healing systems or observability initiatives

Go development service mesh chaos engineering capacity planning experience

Demonstrated success improving system availability reducing incident rates increasing automation

Global technical vision and cross-cultural collaboration experience

Result-oriented self-driven experienced in technical evangelism/sharing

COMPENSATION:

Base Salary: $140000 - $160000 annually (top candidates may receive 5-10% upward adjustment)

401(k): Dollar-for-dollar match up to 4% of salary

Medical Insurance

PTO: 12 days annually

Social Security & Housing Fund: Contributed per US legal requirements

WORK ENVIRONMENT:

Location: Silicon Valley CA OR Raleigh NC (homebase available)

Department: Tech O&M Department

Working Style: Remote-first

Hours: 8 hours per day weekends off

Travel: No business travel required

Expected Start: ASAP

Interview Process: Round 1 (Online): Middle Platform Director Technical Expert Round 2 (Online): Head of HR Round 3 (Online): CEO/Founder