drjobs Senior Site Reliability Engineer

Senior Site Reliability Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Novato, CA - USA

Yearly Salary drjobs

$ 98400 - 145620

Vacancy

1 Vacancy

Job Description


#LI-Onsite

On-Call Requirement: Yes (Periodic Rotation)

Who We Are

2K is headquartered in Novato California and is a wholly owned label of Take-Two Interactive Software Inc. (NASDAQ: TTWO). Founded in 2005 2K Games is a global video game company publishing titles developed by some of the most influential game development studios in the world. Our studios responsible for developing 2Ks portfolio of world-class games across multiple platforms include Visual Concepts Firaxis Hangar 13 CatDaddy Cloud Chamber 31st Union HB Studios and 2K SportsLab. Our portfolio of titles is expanding due to our global strategic plan building and acquiring exciting studios whose content continues to inspire all of us! 2K publishes titles in todays most popular gaming genres including sports shooters action role-playing strategy casual and family entertainment.

Our team of engineers marketers artists writers data scientists producers thinkers and doers are the professional publishing stewards of 2Ks portfolio currently includes several AAA sports and entertainment brands including global powerhouse NBA 2K renowned BioShock Borderlands Mafia Sid Meiers Civilization and XCOM brands; popular WWE 2K and WWE SuperCard franchises TopSpin 2K25 as well as the critically and commercially acclaimed PGA TOUR 2K

At 2K we pride ourselves on creating an inclusive work environment which means encouraging our teams to Come as You Are and do your best work! We encourage ALL applicants to explore our global positions even if they dont meet every requirement for the role. If youre interested in the job and think you have what it takes to work at 2K we encourage you to apply!

What We Need

We are seeking a Senior Site Reliability Engineer (SRE) with deep expertise in Unix/Linux systems architecture distributed infrastructure and automation tooling to help scale and sustain mission-critical platforms that serve millions of active users worldwide. Youll play a leading role in building resilient high-performance services for live gaming environmentsbalancing system stability scalability and operational velocity.


As part of our SRE team youll work across a complex technology stack spanning AWS GCP and hybrid on-prem environments. Youll be responsible for building auto-scaling self-healing Unix-based systems optimizing OS internals and integrating authentication across enterprise identity systems. Youll lead the design of high-availability architecture implement disaster recovery apply advanced performance tuning across kernel network and filesystem layers and define/enforce observability standards using Datadog Grafana and open-source telemetry tools. Your efforts will power real-time insights automated alerting and rapid incident detection and resolution. As a senior member of the on-call rotation youll handle critical outages lead post-mortems and design long-term preventative solutions.


Automation is foundational to this role. Youll build and maintain infrastructure-as-code (IaC) with tools like Terraform puppet and Ansible orchestrating deployments configurations and updates across heterogeneous environments. Youll extend platform APIs and backend tooling using Python and Shell scripts driving continuous improvement in platform delivery.


Collaboration is key: Youll partner with backend and gameplay engineers to embed reliability into every layer of the tech stack. Youll contribute to shared reliability standards CI/CD integration pipelines provisioning templates and internal documentation. As a mentor youll share your expertise in debugging system architecture and tooling best practices empowering engineers across disciplines to build complex resilient systems.

What Youll Do

Systems Design Scaling & Resilience

  • Design and operate distributed Unix-based systems (Red Hat Ubuntu Debian CentOS).
  • Implement auto-scaling and self-healing infrastructure to ensure uptime and durability.
  • Tune system internals including kernel parameters networking and filesystems for high performance.
  • Maintain timely OS patching and compliance posture across environments.
  • Integrate systems with enterprise identity services such as Active Directory LDAP and Kerberos.

Automation & Infrastructure as Code

  • Build and maintain infrastructure automation using Terraform puppet Ansible.
  • Automate deployment pipelines service configurations and patch management.
  • Develop scripts and services in Python and Bash/Shell to enhance infrastructure delivery workflows.
  • Extend APIs and platform automation to drive efficiency and repeatability.

Observability Monitoring & Incident Response

  • Develop observability stacks using Datadog Prometheus Grafana and open-source telemetry tools.
  • Create dashboards and SLO/SLI-based alerts for real-time monitoring of production systems.
  • Participate in a global 24/7 on-call rotation leading response for high-severity incidents.
  • Conduct post-incident analysis (RCA) and drive remediations that improve long-term reliability.

Multi-Cloud & Hybrid Platform Engineering

  • Manage workloads across AWS GCP and on-prem infrastructure.
  • Design and implement multi-region failover load balancing and disaster recovery strategies.
  • Work with both VM-based and containerized/Kubernetes platforms including vSphere/VMware.
  • Support backup restore and DR tooling with strict availability targets.

Collaboration Standards & Enablement

  • Partner with development teams to embed reliability in deployment pipelines.
  • Help define system architecture standards and maintain robust platform documentation.
  • Mentor engineers in Unix performance observability and debugging practices.
  • Champion a culture of automation resilience and continuous improvement.

What Will Make You A Great Fit

  • 7 years in SRE Infrastructure or Systems Engineering roles managing production services.
  • Deep expertise with Unix/Linux systems including Red Hat Debian Ubuntu and CentOS.
  • Experience in kernel tuning performance profiling and debugging complex system issues.
  • 6 years working in AWS and/or GCP with large-scale distributed applications.
  • Advanced skills in Python Shell scripting and optionally Go or Ruby.
  • Strong grasp of IaC tools like Terraform Ansible and puppet.
  • Experience running hybrid infrastructure (cloud/on-prem) with VMware containers and Kubernetes.
  • Hands-on experience with monitoring telemetry and observability stacks.

Additional qualities

  • Experience supporting live game services or other high-throughput low-latency platforms.
  • Contributions to open-source tooling in observability automation or infrastructure domains.
  • Familiarity with telemetry pipelines like ETL Flink Kafka or Kinesis.
  • Experience with Kubernetes-native tooling and service meshes (e.g. Istio Linkerd).
  • Operational knowledge of MySQL/Postgres in cloud-native and bare-metal deployments.

You thrive in collaborative environments that value technical skill and operational excellence. Your passion for high-quality infrastructure empowers development teams and enhances productivity.

As an equal opportunity employer we are committed to ensuring that qualified individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process to perform their essential job functions and to receive other benefits and privileges of employment. Please contact us if you need reasonable accommodation.

Please note that 2K Games and its studios never uses instant messaging apps or personal email accounts to contact prospective employees or conduct interviews and when emailing only use accounts.


The pay range for this position in California at the start of employment is expected to be between $98400 and $145620 per Year. However base pay offered is based on market location and may vary further depending on individualized factors for job candidates such as job-related knowledge skills experience and other objective business considerations. Subject to those same considerations the total compensation package for this position may also include other elements including a bonus and/or equity awards and eligibility to participate in our 401(K) plan and Employee Stock Purchase Program. Regular full-time employees are also eligible for a range of benefits at the Company including: medical dental vision and basic life insurance coverage; 14 paid holidays per calendar year; paid vacation time per calendar year (ranging from 15 to 25 days) or eligibility to participate in the Companys discretionary time off program; up to 10 paid sick days per calendar year; paid parental and compassionate leave; wellbeing programs for mental health and other wellness support; family planning support through Maven; commuter benefits; and reimbursements for fitness-related expenses.


Required Experience:

Senior IC

Employment Type

Full Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.