About kwiff:
kwiff isnt gambling as you know it. Were redefining the experience with a bold player-first approach to sports betting and casino powered by our proprietary tech platform fully automated sportsbook and standout UX across web and mobile. What truly sets us apart Supercharging. Our signature feature allows players odds cash outs and more to be supercharged at random creating surprise wins and a thrilling betting experience.
The role & responsibilities:
Were looking for a Site Reliability Engineer to join our team. This role bridges development and operations with a strong focus on automation monitoring incident management and reliability engineering. Youll be pivotal in ensuring our platform runs smoothly securely and at scale while collaborating closely with development and product teams.
Platform Reliability & Incident Management
- Act as the primary responder in the on-call rotation for production incidents.
- Lead incident response and coordination during platform emergencies.
- Own and maintain the incident management process including documentation and communication protocols.
- Facilitate post-incident reviews and ensure follow-up actions are tracked and implemented (on-call engineer leads the review).
Monitoring & Observability
- Own the implementation and optimisation of monitoring tools across the platform.
- Design and implement monitoring for application performance database health cache clusters API endpoints and service latencies.
- Establish and track reliability metrics
Security & Dependency Management
- Monitor and alert on security risks via application monitoring.
- Ensure secure handling and rotation of monitoring credentials and tokens in DataDog and other tools.
- Track and manage external API versions and deprecation timelines including documentation alerting and coordination with dev teams.
- Own the dependency update process across services: track outdated dependencies prioritise updates coordinate rollouts and monitor for vulnerabilities.
- Maintain dashboards for dependency freshness and API version status.
- Contribute to security incident response when related to monitoring or integrations.
Performance & Reliability Engineering
- Define and maintain SLOs for critical services.
- Create and manage a stability and performance-focused feature roadmap.
- Implement automated alerting with thresholds that minimise alert fatigue.
- Contribute to architectural discussions prioritising scalability reliability and cost efficiency.
- Participate in release planning and deployment processes to enforce reliability standards.
Release Management
- Review and validate production deployments from a reliability perspective.
- Collaborate with developers on robust deployment practices.
- Monitor post-deployment performance and stability.
- Support deployment automation improvements.
Collaboration & Communication
- Participate in daily standups and team meetings.
- Respond to requests in relevant Slack channels.
- Maintain clear documentation of SRE processes and procedures.
- Track work items in ClickUp for visibility and prioritisation.
- Work closely with development operations and product teams to embed reliability best practices.
Skills were looking for:
- Proven experience in Site Reliability Engineering DevOps or related fields.
- Strong knowledge of monitoring/observability tools (DataDog preferred).
- Solid understanding of cloud infrastructure APIs and CI/CD pipelines.
- Experience with incident response incident management and post-mortem facilitation.
- Familiarity with security practices and dependency management.
- Ability to work cross-functionally and communicate clearly with technical and non-technical stakeholders.
Nice to haves:
- Experience with SLO framework development.
- Hands-on experience implementing DORA and SPACE metrics.
- Familiarity with HappierPlace or similar dev environment management tools.
- Prior exposure to cost-driven architectural decision making.
What we can offer you:
At Kwiff we believe in rewarding our team with an environment thats both exciting and supportive. Heres what youll enjoy as part of our team:
Private Healthcare Comprehensive medical insurance through Vitality Health.
Life Insurance Coverage through Yulife for added peace of mind.
Performance Bonuses Quarterly bonuses based on team achievements.
Wellbeing Allowance Spend on gym memberships or other wellness activities.
Lunch Budget Enjoy a budget to spend on food and beverages when working from the office.
Sustainable Commuting Cycle to Work schemes on offer.
Parental Support Nursery schemes to reduce monthly fees.
Long Service Rewards Exciting travel rewards for dedication after five years of service.
Learning Budget Financial support for role-specific training to level up your skills.
Team Socials & Activities Regular events plus office perks like ping pong darts and PlayStation.
Why join us
At kwiff we dont just follow trends. We create them. From unlimited betting options to surprise wins and slick user journeys were building a product that players love. Join us and help shape the future of betting.
Kwiff is an equal opportunity employer. We value diversity and are committed to creating an inclusive environment for all employees.
We aim for equity at all three stages of the recruitment process. Please let us know if theres anything we can do to make the process more accessible to you.