Middle DevOpsSRE Engineer (Serbia, Croatia, Poland, Portugal)

Belgrade - Serbia

Monthly Salary: Not Disclosed

Posted on: 15-10-2025

Vacancies: 1 Vacancy

Job Summary

A fast-growing provider of AI povered solutions is scaling its operations. With a strong customer base and increasing demand the existing engineering team is under pressure to handle both infrastructure improvements and customer-facing support.

To meet this growth the company is looking to add an Infrastructure Engineer in a team of two (will be a third engineer) supporting Kafka Redis Opensearch RabbitMq ClickHouse for products.

Tasks

Manage monitor and optimize ClickHouse clusters in production including schema design query performance tuning replication configuration and capacity planning;
Operate and maintain Kafka clusters OpenSearch deployments and other distributed systems ensuring high availability and optimal performance;
Deploy configure and manage containerized applications and stateful workloads on Kubernetes implementing best practices for resource management and scaling;
Implement and maintain GitOps workflows for infrastructure and application deployments ensuring version-controlled and automated deployment processes;
Design and implement comprehensive monitoring logging and alerting solutions for distributed systems enabling proactive issue detection and rapid troubleshooting;
Conduct performance analysis identify bottlenecks and implement optimizations across distributed systems to meet SLA requirements and improve system resilience;
Create and maintain technical documentation runbooks and operational procedures while collaborating with development teams to ensure smooth integration and operations.

Requirements

Hands-on experience operating distributed systems in production environments with strong understanding of distributed computing concepts data consistency and fault tolerance;
Solid experience with ClickHouse including cluster management MergeTree engine families data modeling query optimization and replication strategies;
Practical experience deploying and managing applications on Kubernetes including StatefulSets persistent volumes networking and security configurations;
Working knowledge of Apache Kafka (brokers topics partitions consumer groups) and OpenSearch or similar search and analytics engines;
Experience with GitOps practices and Infrastructure as Code tools (Terraform Helm or similar) with ability to manage infrastructure through declarative configuration;
Proficiency with monitoring and observability platforms (Prometheus Grafana or similar) and experience implementing metrics collection and alerting strategies;
Hands-on experience with at least one major cloud platform (AWS GCP or Azure) including compute storage and networking services;
Strong scripting and programming skills in Python Go or Bash for automation tooling development and operational tasks.

Nice to have:

Experience with other distributed databases (Redis Spark Flink etc.);
Knowledge of data streaming patterns and event-driven architectures;
Strong analytical and troubleshooting skills with ability to diagnose complex distributed systems issues coupled with clear communication skills for cross-functional collaboration.

Benefits

Working conditions:

This role availible only for candidates from Croatia Serbia Portugal Poland
Duration: 1 year with extension possibility;
Locations: Serbia Portugal Croatia Poland;
Overlap: Until 11:00 AM PST at max.
Employment Type: Full-time

To meet this growth the company is looking to add an Infrastructure Engineer in a team of two (will be a third engineer) supporting Kafka Redis Opensearch RabbitMq ClickHouse for products.

Tasks

Manage monitor and optimize ClickHouse clusters in production including schema design query performance tuning replication configuration and capacity planning;
Operate and maintain Kafka clusters OpenSearch deployments and other distributed systems ensuring high availability and optimal performance;
Deploy configure and manage containerized applications and stateful workloads on Kubernetes implementing best practices for resource management and scaling;
Implement and maintain GitOps workflows for infrastructure and application deployments ensuring version-controlled and automated deployment processes;
Design and implement comprehensive monitoring logging and alerting solutions for distributed systems enabling proactive issue detection and rapid troubleshooting;
Conduct performance analysis identify bottlenecks and implement optimizations across distributed systems to meet SLA requirements and improve system resilience;
Create and maintain technical documentation runbooks and operational procedures while collaborating with development teams to ensure smooth integration and operations.

Requirements

Hands-on experience operating distributed systems in production environments with strong understanding of distributed computing concepts data consistency and fault tolerance;
Solid experience with ClickHouse including cluster management MergeTree engine families data modeling query optimization and replication strategies;
Practical experience deploying and managing applications on Kubernetes including StatefulSets persistent volumes networking and security configurations;
Working knowledge of Apache Kafka (brokers topics partitions consumer groups) and OpenSearch or similar search and analytics engines;
Experience with GitOps practices and Infrastructure as Code tools (Terraform Helm or similar) with ability to manage infrastructure through declarative configuration;
Proficiency with monitoring and observability platforms (Prometheus Grafana or similar) and experience implementing metrics collection and alerting strategies;
Hands-on experience with at least one major cloud platform (AWS GCP or Azure) including compute storage and networking services;
Strong scripting and programming skills in Python Go or Bash for automation tooling development and operational tasks.

Nice to have:

Experience with other distributed databases (Redis Spark Flink etc.);
Knowledge of data streaming patterns and event-driven architectures;
Strong analytical and troubleshooting skills with ability to diagnose complex distributed systems issues coupled with clear communication skills for cross-functional collaboration.