Shared Tech Site Reliability engineer

Warsaw - Poland

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Join us at Playtika (NASDAQ: PLTK) where were driven by the belief life needs play. Were on a mission to deliver infinite ways to play using cutting-edge technologies like AI and machine learning to craft immersive experiences that connect inspire and entertain millions of players worldwide.

From our start as a small mobile games company founded in Israel to our current position as a publicly traded company and industry leader we continue to be a dominant force in interactive entertainment. With a diverse portfolio of award-winning category-leading Casual and Social Casino-themed games including nine of the top 100 highest-grossing mobile games in the US were setting the standard for excellence.

Our success story is co-authored by a dynamic team of storytellers strategists creators and data scientists who thrive on innovation. We are home of the best advancing an inclusive culture that embraces our core values and reflects our agile DNA.

With a strong financial foundation disciplined operations unwavering player-focused approach and relentless can-do spirit were well-positioned for sustained growth. If youre ready to join the driving force behind the evolution of interactive entertainment we invite you to come play with us.

We are looking for an SRE / OPS / DevOps Engineer to join our infrastructure practice this is a cross-functional role that combines Site Reliability Engineering Operations and DevOps disciplines. You will be embedded in a team that owns and operates complex multi-datacenter infrastructure supporting multiple high-traffic gaming studios at Playtika.

You will work alongside experienced engineers across SRE DBA and Platform teams to keep our services reliable scalable and observable.

Responsibilities

Operate and maintain Kubernetes-based infrastructure (SpectroCloud / Cloudstack) across multiple datacenters in the US.
Support and troubleshoot a wide range of stateful workloads running in k8s: Kafka Redis / KeyDB / RedisLabs MariaDB / Galera Aerospike Singlestore Elasticsearch / OpenSearch
Participate in cloud migration projects prepare environments perform data sync and failover execute final cutover
Manage and maintain monitoring alerting and observability stacks: Prometheus VictoriaMetrics Grafana AlertManager PagerDuty
Maintain and improve logging infrastructure: ELK / OpenSearch stack Filebeat Logstash including configuration performance tuning and index lifecycle management
Configure and maintain load balancers (Nginx MaxScale internal PLB / IPVS-based solutions) and manage DNS records
Manage SSL certificate lifecycle automation (Sectigo Prometheus / Grafana)
Administer and maintain secrets management systems (HashiCorp Vault External Secrets Operator)
Participate in on-call duty rotation and respond to production incidents; follow and improve SRE alert handling guidelines
Contribute to GitOps workflows: maintain infrastructure-as-code in Git work with Flux CD deployment packages and Ansible playbooks
Review and extend automation scripts and tooling (primarily Bash and Python)
Provide SRE-level support to multiple game studios: investigate production issues handle ChatOps requests and collaborate with development teams
Write and maintain operational runbooks SOPs migration plans and other technical documentation in Confluence
Perform capacity planning reviews and resource utilization analysis for datastores and cluster nodes
Participate in cross-team initiatives and contribute to platform-level improvements

Requirements

Solid hands-on experience with Linux systems (primarily Ubuntu 22.04 / 24.04 LTS)
Practical knowledge of Kubernetes administration: workloads operators resource management node maintenance cluster upgrades
Experience operating and troubleshooting stateful services in k8s: at least one of Kafka Redis / KeyDB MariaDB / Galera Elasticsearch / OpenSearch Aerospike
Familiarity with GitOps approach and tools: Git Flux CD Helm Kustomize
Monitoring and observability experience: Prometheus ecosystem VictoriaMetrics Grafana AlertManager
Practical experience with ELK / OpenSearch stack (Filebeat Logstash index management)
Solid scripting skills (Bash); ability to read and modify Python
Understanding of networking fundamentals: TCP/IP DNS load balancing ports and protocols VIPs VLANs
Experience with HashiCorp Vault or similar secrets management solutions
Familiarity with Ansible for infrastructure automation
Ability to troubleshoot complex distributed system issues under production pressure
Strong communication skills: ability to collaborate across SRE DBA RnD and NOC teams
Experience working with Jira-based workflow and documenting in Confluence

Nice to Have

Experience with cloud migration projects (datacenter-to-cloud or cloud-to-datacenter)
Knowledge of additional datastores: Singlestore (SingleStore / MemSQL) Aerospike Couchbase KeyDB
Familiarity with HashiCorp Boundary for secure remote access and Ansible dynamic inventory
Experience with load balancer solutions: Nginx F5 IPVS
Understanding of high-availability and DR patterns: active-active active-passive failover procedures
Exposure to SSL certificate lifecycle automation (cert-manager Sectigo)
Knowledge of PagerDuty or similar on-call and incident management platforms
Experience with AWS (IAM S3 EC2) in the context of infrastructure operations
Understanding of SLO/SLA concepts and how they apply to infrastructure reliability
Familiarity with Python-based monitoring collectors or custom exporters for Prometheus
Experience with capacity planning performance analysis and resource optimization in production environments

Our Stack at a Glance

Container Orchestration	Kubernetes (SpectroCloud Cloudstack) Flux CD Helm Kustomize
Databases & Datastores	MariaDB / Galera MaxScale Singlestore Aerospike Redis / KeyDB RedisLabs (Redis Enterprise) Couchbase
Message Brokers	Apache Kafka Kafka Mirror Maker
Search & Logging	Elasticsearch / OpenSearch (ECK) Kibana / OpenSearch Dashboards Filebeat Logstash
Monitoring	Prometheus VictoriaMetrics Grafana AlertManager PagerDuty
Load Balancing	Nginx IPVS MaxScale
Secrets & Access	HashiCorp Vault External Secrets Operator HashiCorp Boundary
Automation & CI	Ansible Jenkins Bash Python
Cloud & Infra	CloudStack AWS DNS Sectigo (SSL automation)
Collaboration	Jira Confluence Git (GitHub) Teams

Required Experience:

You will work alongside experienced engineers across SRE DBA and Platform teams to keep our services reliable scalable and observable.

Responsibilities

Operate and maintain Kubernetes-based infrastructure (SpectroCloud / Cloudstack) across multiple datacenters in the US.
Support and troubleshoot a wide range of stateful workloads running in k8s: Kafka Redis / KeyDB / RedisLabs MariaDB / Galera Aerospike Singlestore Elasticsearch / OpenSearch
Participate in cloud migration projects prepare environments perform data sync and failover execute final cutover
Manage and maintain monitoring alerting and observability stacks: Prometheus VictoriaMetrics Grafana AlertManager PagerDuty
Maintain and improve logging infrastructure: ELK / OpenSearch stack Filebeat Logstash including configuration performance tuning and index lifecycle management
Configure and maintain load balancers (Nginx MaxScale internal PLB / IPVS-based solutions) and manage DNS records
Manage SSL certificate lifecycle automation (Sectigo Prometheus / Grafana)
Administer and maintain secrets management systems (HashiCorp Vault External Secrets Operator)
Participate in on-call duty rotation and respond to production incidents; follow and improve SRE alert handling guidelines
Contribute to GitOps workflows: maintain infrastructure-as-code in Git work with Flux CD deployment packages and Ansible playbooks
Review and extend automation scripts and tooling (primarily Bash and Python)
Provide SRE-level support to multiple game studios: investigate production issues handle ChatOps requests and collaborate with development teams
Write and maintain operational runbooks SOPs migration plans and other technical documentation in Confluence
Perform capacity planning reviews and resource utilization analysis for datastores and cluster nodes
Participate in cross-team initiatives and contribute to platform-level improvements

Requirements

Solid hands-on experience with Linux systems (primarily Ubuntu 22.04 / 24.04 LTS)
Practical knowledge of Kubernetes administration: workloads operators resource management node maintenance cluster upgrades
Experience operating and troubleshooting stateful services in k8s: at least one of Kafka Redis / KeyDB MariaDB / Galera Elasticsearch / OpenSearch Aerospike
Familiarity with GitOps approach and tools: Git Flux CD Helm Kustomize
Monitoring and observability experience: Prometheus ecosystem VictoriaMetrics Grafana AlertManager
Practical experience with ELK / OpenSearch stack (Filebeat Logstash index management)
Solid scripting skills (Bash); ability to read and modify Python
Understanding of networking fundamentals: TCP/IP DNS load balancing ports and protocols VIPs VLANs
Experience with HashiCorp Vault or similar secrets management solutions
Familiarity with Ansible for infrastructure automation
Ability to troubleshoot complex distributed system issues under production pressure
Strong communication skills: ability to collaborate across SRE DBA RnD and NOC teams
Experience working with Jira-based workflow and documenting in Confluence

Nice to Have

Experience with cloud migration projects (datacenter-to-cloud or cloud-to-datacenter)
Knowledge of additional datastores: Singlestore (SingleStore / MemSQL) Aerospike Couchbase KeyDB
Familiarity with HashiCorp Boundary for secure remote access and Ansible dynamic inventory
Experience with load balancer solutions: Nginx F5 IPVS
Understanding of high-availability and DR patterns: active-active active-passive failover procedures
Exposure to SSL certificate lifecycle automation (cert-manager Sectigo)
Knowledge of PagerDuty or similar on-call and incident management platforms
Experience with AWS (IAM S3 EC2) in the context of infrastructure operations
Understanding of SLO/SLA concepts and how they apply to infrastructure reliability
Familiarity with Python-based monitoring collectors or custom exporters for Prometheus
Experience with capacity planning performance analysis and resource optimization in production environments

Our Stack at a Glance

Container Orchestration	Kubernetes (SpectroCloud Cloudstack) Flux CD Helm Kustomize
Databases & Datastores	MariaDB / Galera MaxScale Singlestore Aerospike Redis / KeyDB RedisLabs (Redis Enterprise) Couchbase
Message Brokers	Apache Kafka Kafka Mirror Maker
Search & Logging	Elasticsearch / OpenSearch (ECK) Kibana / OpenSearch Dashboards Filebeat Logstash
Monitoring	Prometheus VictoriaMetrics Grafana AlertManager PagerDuty
Load Balancing	Nginx IPVS MaxScale
Secrets & Access	HashiCorp Vault External Secrets Operator HashiCorp Boundary
Automation & CI	Ansible Jenkins Bash Python
Cloud & Infra	CloudStack AWS DNS Sectigo (SSL automation)
Collaboration	Jira Confluence Git (GitHub) Teams

Required Experience:

Key Skills

Apply Now

About Company

Playtika

Discover how Playtika blends art and science to create engaging experiences for millions worldwide, setting the standard for gaming brands and companies.

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click