Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailWe are looking for experienced Senior Site Reliability Engineers (SREs) to join our team and help maintain and enhance the reliability scalability and performance of our cloudbased systems. Our platform processes vast amounts of data in real time and operates 24/7 with high availability requiring expertise in automation monitoring and incident resolution.
This role requires onsite presence at our office 4 days a week to support effective collaboration and teamwork.
Design implement and maintain highly available faulttolerant cloud infrastructure with an InfrastructureasCode (IaC) approach.
Develop and optimize automated CI/CD pipelines following the GitOps methodology.
Improve service scalability and engineering productivity through automation.
Monitor and maintain production systems proactively identifying and resolving performance bottlenecks.
Implement security and compliance best practices.
Develop and maintain observability solutions ensuring comprehensive monitoring alerting and logging across distributed systems.
Participate in an oncall rotation incident resolution and root cause analysis to enhance system resilience.
Plan and execute disaster recovery and system capacity scaling strategies.
Collaborate closely with development and architecture teams to drive performance improvements and optimize infrastructure.
4 years of experience as an SRE Systems Engineer or DevOps Engineer supporting largescale highavailability systems.
Strong Linux administration skills and knowledge of networking fundamentals TCP/IP DNS routing.
Handson experience with public cloud providers (AWS GCP or Azure) and container orchestration using Kubernetes & Docker.
Proven expertise in InfrastructureasCode tools (Terraform Ansible ArgoCD or Helm).
Proficiency in automation and scripting using Python Go or Bash.
Experience working with distributed systems and databases such as Kafka Cassandra ClickHouse PostgreSQL MySQL MongoDB or VictoriaMetrics.
Familiarity with CI/CD tools such as GitLab CI/CD Spinnaker and experience deploying highavailability applications.
Strong knowledge of monitoring and logging systems like Prometheus Grafana ELK Stack Zabbix or CloudWatch.
Effective communication and problemsolving skills with the ability to work in a globally distributed team.
Fluent English (written & spoken).
Experience with highload distributed systems and microservices.
Knowledge of VoIP solutions contact center technologies or SaaS monitoring practices.
Experience with JVM tuning Nginx administration and highavailability configurations (HAProxy Keepalived).
Familiarity with ITIL or other IT service management frameworks.
A wellcoordinated professional team working on cuttingedge technologies.
Interesting and challenging tasks in a dynamic environment with opportunities for professional growth.
Additional Health and Life Insurance Package.
Employee Assistance Program.
25 vacation days.
200 BGN Digital Food Vouchers.
120 BGN Gross as part of the salary for Working Expenses Allowance.
Full Time