This is a remote position.
We are looking for Lead Site Reliability Engineer to join our team.
Responsibilities:
- Provide tech leadership in SRE execution and planning
- Lead complex infra projects for both internal and external stakeholders
- Orchestrate and run our infrastructure
- Add to and tune our monitoring
- Reduce or automate manual processes
- Be on an on-call (PagerDuty) rotation to respond to incidents that impact availability
- Plan the growth of our infrastructure as we continue to scale
- Vendor management
- Manage the technical roadmap for the SRE team
- Infrastructure cost monitoring and optimisations
- Supporting engineers and improving development workflows
- Talk directly to large customers
- Co-ordinate with team members across timezones
Requirements
- Build a technical competent SRE team through a clear set of OKRs
- Build essential tooling to improve the infra ops
- Have run global mission-critical infrastructure
- Have managed systems that handle high request volumes
- Know your way around Linux and the Unix Shell
- Have used configuration management systems
- Have used infrastructure automation tools
- Have implemented CI / CD pipelines
Have experience with some of the following technologies:
- Kubernetes
- Docker
- Terraform
- Ansible
- Nginx
- Github Actions
- Grafana
- Prometheus
- Loki
- AWS
- Google Cloud
- Major CDN vendors
- Github Actions Workflows managing self-hosted runners
- Video streaming technologies (HLS RTMP transcoding etc.)
- COBOL
- Web3 / Blockchain particularly the Ethereum ecosystem
Benefits
- Work Location: Remote
- 5 days working
Build a technical competent SRE team through a clear set of OKRs Build essential tooling to improve the infra ops Have run global mission-critical infrastructure Have managed systems that handle high request volumes Know your way around Linux and the Unix Shell Have used configuration management systems Have used infrastructure automation tools Have implemented CI / CD pipelines Have experience with some of the following technologies: Kubernetes Docker Terraform Ansible Nginx Github Actions Grafana Prometheus Loki AWS Google Cloud Major CDN vendors Github Actions, Workflows, managing self-hosted runners Video streaming technologies (HLS, RTMP, transcoding etc.) COBOL Web3 / Blockchain, particularly the Ethereum ecosystem