SITE RELIABILITY ENGINEER (MID-LEVEL)

Shkolo

Not Interested
Bookmark
Report This Job

profile Job Location:

Sofia - Bulgaria

profile Monthly Salary: Not Disclosed
Posted on: 22 hours ago
Vacancies: 1 Vacancy

Job Summary

ABOUT THE ROLE

We are looking for a Mid-level Site Reliability Engineer (SRE) to help transition our incident management from reactive firefighting to proactive reliability engineering.

You will play a key role in improving observability reducing incident frequency and helping engineering teams understand how systems behave in production.

KEY RESPONSIBILITIES

Own and improve monitoring alerting and observability for production systems

Lead or contribute to incident investigations and postmortems

Design alerts based on symptoms and user impact rather than infrastructure noise

Use observability tools to analyze performance errors and traffic patterns

Identify reliability risks before they turn into incidents

Improve run books on-call processes and operational readiness

Work closely with software teams to improve system resilience

Automate repetitive operational tasks

REQUIRED SKILLS AND EXPERIENCE

Strong Linux experience in production environments

Hands-on experience with at least one major cloud provider (AWS preferred)

Solid understanding of monitoring alerting and incident response

Experience with observability tools (New Relic Prometheus Datadog etc.)

Scripting or automation experience (Bash Python or similar)

Understanding of distributed systems fundamentals

Comfortable participating in on-call rotations

NICE TO HAVE

Experience with Infrastructure as Code (Terraform CloudFormation etc.)

Experience with containers or orchestration (ECS Kubernetes Docker)

Experience supporting PHP or similar application stacks

Familiarity with SRE concepts such as SLIs SLOs and error budgets

WHAT SUCCESS LOOKS LIKE

Reduced number of repeat incidents

Clear and actionable alerts

Faster detection and resolution of incidents

Improved visibility into system health and performance

Engineering teams that trust monitoring data

WHY JOIN US

Real influence over how reliability is implemented across the company

Work on systems operating at meaningful scale

Opportunity to grow into Senior SRE or SRE Tech Lead roles

Strong focus on engineering quality rather than ticket volume

Flexible working arrangements with the ability to work both from our Sofia office and from home in a hybrid working model also available!

Knowledge of foreign languages:
Proficiency in English at least level B1 of the Common European Framework of Reference for Languages.

ABOUT THE ROLEWe are looking for a Mid-level Site Reliability Engineer (SRE) to help transition our incident management from reactive firefighting to proactive reliability engineering.You will play a key role in improving observability reducing incident frequency and helping engineering teams unders...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting

About Company

Company Logo

With over 1,700 schools and more than 1 million users, Shkolo is Bulgaria's leading Management Information System (MIS) provider.Now a proud member of the Juniper Education group, Shkolo is expanding its products to over 16,000 schools worldwide.At Shkolo, we are revolutionizing educa ... View more

View Profile View Profile