We are looking for a Mid-level Site Reliability Engineer (SRE) to help transition our incident management from reactive firefighting to proactive reliability engineering.
You will play a key role in improving observability reducing incident frequency and helping engineering teams understand how systems behave in production.
KEY RESPONSIBILITIES
Own and improve monitoring alerting and observability for production systems
Lead or contribute to incident investigations and postmortems
Design alerts based on symptoms and user impact rather than infrastructure noise
Use observability tools to analyze performance errors and traffic patterns
Identify reliability risks before they turn into incidents
Improve run books on-call processes and operational readiness
Work closely with software teams to improve system resilience
Automate repetitive operational tasks
REQUIRED SKILLS AND EXPERIENCE
Strong Linux experience in production environments
Hands-on experience with at least one major cloud provider (AWS preferred)
Solid understanding of monitoring alerting and incident response
Experience with observability tools (New Relic Prometheus Datadog etc.)
Scripting or automation experience (Bash Python or similar)
Understanding of distributed systems fundamentals
Comfortable participating in on-call rotations
NICE TO HAVE
Experience with Infrastructure as Code (Terraform CloudFormation etc.)
Experience with containers or orchestration (ECS Kubernetes Docker)
Experience supporting PHP or similar application stacks
Familiarity with SRE concepts such as SLIs SLOs and error budgets
WHAT SUCCESS LOOKS LIKE
Reduced number of repeat incidents
Clear and actionable alerts
Faster detection and resolution of incidents
Improved visibility into system health and performance
Engineering teams that trust monitoring data
WHY JOIN US
Real influence over how reliability is implemented across the company
Work on systems operating at meaningful scale
Opportunity to grow into Senior SRE or SRE Tech Lead roles
Strong focus on engineering quality rather than ticket volume
Flexible working arrangements with the ability to work both from our Sofia office and from home in a hybrid working model also available!
Knowledge of foreign languages:
Proficiency in English at least level B1 of the Common European Framework of Reference for Languages.
With over 1,700 schools and more than 1 million users, Shkolo is Bulgaria's leading Management Information System (MIS) provider.Now a proud member of the Juniper Education group, Shkolo is expanding its products to over 16,000 schools worldwide.At Shkolo, we are revolutionizing educa ... View more