Site Reliability Engineer (SRE) With Observability
10 years
We are looking for an experienced Site Reliability Engineer to join our team and ensure our systems remain reliable scalable and performant - especially during high-visibility high-traffic events. This role focuses on proactively preparing infrastructure and services for major events designing scalable solutions automating workflows and acting as a first responder during live incidents.
You will collaborate with engineering product and event operations teams to make sure our customers experience smooth uninterrupted service - even at massive scale.
What Youll Do
Serve as an on-call point of contact during live events
Monitor system health in real time and proactively mitigate performance issues
Rapidly diagnose and mitigate production issues under pressure
Lead post-event reviews analyzing performance data and incident timelines
Document learnings and recommendations to improve reliability at scale
Site Reliability Engineer (SRE) With Observability 10 years We are looking for an experienced Site Reliability Engineer to join our team and ensure our systems remain reliable scalable and performant - especially during high-visibility high-traffic events. This role focuses on proactively p...
Site Reliability Engineer (SRE) With Observability
10 years
We are looking for an experienced Site Reliability Engineer to join our team and ensure our systems remain reliable scalable and performant - especially during high-visibility high-traffic events. This role focuses on proactively preparing infrastructure and services for major events designing scalable solutions automating workflows and acting as a first responder during live incidents.
You will collaborate with engineering product and event operations teams to make sure our customers experience smooth uninterrupted service - even at massive scale.
What Youll Do
Serve as an on-call point of contact during live events
Monitor system health in real time and proactively mitigate performance issues
Rapidly diagnose and mitigate production issues under pressure
Lead post-event reviews analyzing performance data and incident timelines
Document learnings and recommendations to improve reliability at scale
View more
View less