Shared Tech Site Reliability engineer

Playtika

Not Interested
Bookmark
Report This Job

profile Job Location:

Warsaw - Poland

profile Monthly Salary: Not Disclosed
Posted on: 10 hours ago
Vacancies: 1 Vacancy

Job Summary

We are looking for an SRE / OPS / DevOps Engineer to join our infrastructure practice this is a cross-functional role that combines Site Reliability Engineering Operations and DevOps disciplines. You will be embedded in a team that owns and operates complex multi-datacenter infrastructure supporting multiple high-traffic gaming studios at Playtika.

You will work alongside experienced engineers across SRE DBA and Platform teams to keep our services reliable scalable and observable.

Responsibilities

  • Operate and maintain Kubernetes-based infrastructure (SpectroCloud / Cloudstack) across multiple datacenters in the US.
  • Support and troubleshoot a wide range of stateful workloads running in k8s: Kafka Redis / KeyDB / RedisLabs MariaDB / Galera Aerospike Singlestore Elasticsearch / OpenSearch
  • Participate in cloud migration projects prepare environments perform data sync and failover execute final cutover
  • Manage and maintain monitoring alerting and observability stacks: Prometheus VictoriaMetrics Grafana AlertManager PagerDuty
  • Maintain and improve logging infrastructure: ELK / OpenSearch stack Filebeat Logstash including configuration performance tuning and index lifecycle management
  • Configure and maintain load balancers (Nginx MaxScale internal PLB / IPVS-based solutions) and manage DNS records
  • Manage SSL certificate lifecycle automation (Sectigo Prometheus / Grafana)
  • Administer and maintain secrets management systems (HashiCorp Vault External Secrets Operator)
  • Participate in on-call duty rotation and respond to production incidents; follow and improve SRE alert handling guidelines
  • Contribute to GitOps workflows: maintain infrastructure-as-code in Git work with Flux CD deployment packages and Ansible playbooks
  • Review and extend automation scripts and tooling (primarily Bash and Python)
  • Provide SRE-level support to multiple game studios: investigate production issues handle ChatOps requests and collaborate with development teams
  • Write and maintain operational runbooks SOPs migration plans and other technical documentation in Confluence
  • Perform capacity planning reviews and resource utilization analysis for datastores and cluster nodes
  • Participate in cross-team initiatives and contribute to platform-level improvements

Requirements

  • Solid hands-on experience with Linux systems (primarily Ubuntu 22.04 / 24.04 LTS)
  • Practical knowledge of Kubernetes administration: workloads operators resource management node maintenance cluster upgrades
  • Experience operating and troubleshooting stateful services in k8s: at least one of Kafka Redis / KeyDB MariaDB / Galera Elasticsearch / OpenSearch Aerospike
  • Familiarity with GitOps approach and tools: Git Flux CD Helm Kustomize
  • Monitoring and observability experience: Prometheus ecosystem VictoriaMetrics Grafana AlertManager
  • Practical experience with ELK / OpenSearch stack (Filebeat Logstash index management)
  • Solid scripting skills (Bash); ability to read and modify Python
  • Understanding of networking fundamentals: TCP/IP DNS load balancing ports and protocols VIPs VLANs
  • Experience with HashiCorp Vault or similar secrets management solutions
  • Familiarity with Ansible for infrastructure automation
  • Ability to troubleshoot complex distributed system issues under production pressure
  • Strong communication skills: ability to collaborate across SRE DBA RnD and NOC teams
  • Experience working with Jira-based workflow and documenting in Confluence

Nice to Have

  • Experience with cloud migration projects (datacenter-to-cloud or cloud-to-datacenter)
  • Knowledge of additional datastores: Singlestore (SingleStore / MemSQL) Aerospike Couchbase KeyDB
  • Familiarity with HashiCorp Boundary for secure remote access and Ansible dynamic inventory
  • Experience with load balancer solutions: Nginx F5 IPVS
  • Understanding of high-availability and DR patterns: active-active active-passive failover procedures
  • Exposure to SSL certificate lifecycle automation (cert-manager Sectigo)
  • Knowledge of PagerDuty or similar on-call and incident management platforms
  • Experience with AWS (IAM S3 EC2) in the context of infrastructure operations
  • Understanding of SLO/SLA concepts and how they apply to infrastructure reliability
  • Familiarity with Python-based monitoring collectors or custom exporters for Prometheus
  • Experience with capacity planning performance analysis and resource optimization in production environments

Our Stack at a Glance

Container Orchestration

Kubernetes (SpectroCloud Cloudstack) Flux CD Helm Kustomize

Databases & Datastores

MariaDB / Galera MaxScale Singlestore Aerospike Redis / KeyDB RedisLabs (Redis Enterprise) Couchbase

Message Brokers

Apache Kafka Kafka Mirror Maker

Search & Logging

Elasticsearch / OpenSearch (ECK) Kibana / OpenSearch Dashboards Filebeat Logstash

Monitoring

Prometheus VictoriaMetrics Grafana AlertManager PagerDuty

Load Balancing

Nginx IPVS MaxScale

Secrets & Access

HashiCorp Vault External Secrets Operator HashiCorp Boundary

Automation & CI

Ansible Jenkins Bash Python

Cloud & Infra

CloudStack AWS DNS Sectigo (SSL automation)

Collaboration

Jira Confluence Git (GitHub) Teams


Required Experience:

IC

We are looking for an SRE / OPS / DevOps Engineer to join our infrastructure practice this is a cross-functional role that combines Site Reliability Engineering Operations and DevOps disciplines. You will be embedded in a team that owns and operates complex multi-datacenter infrastructure supportin...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting

About Company

Company Logo

Discover how Playtika blends art and science to create engaging experiences for millions worldwide, setting the standard for gaming brands and companies.

View Profile View Profile