Skills
Proficiency in one or more major cloud platforms such as Azure (preference) Google Cloud AWS or others
Experience with modularized IaC (Infrastructure as Code) and industry tooling e.g. Terraform/OpenTofu Ansible Packer
Configuration management Secrets Management (Consul Vault KMS systems Consul Template)
Operational concerns: Disaster Recovery Disaster Resilience Monitoring Alerting
(ITSM) Change Knowledge & Incident Management
Containerization technologies: Docker/Containerd Kubernetes (K8S K3S) container registries and artifact tools (e.g. Artifactory GCR ACR ECR)
Administration of Kubernetes clusters.
Understanding of microservices principles and best practices for designing scalable and modular architectures including API Gateway technologies and best practices Understanding of event-driven architecture principles and best practices and tooling (e.g. Pub/Sub Kafka etc.)
Familiarity with serverless computing services and functions (e.g. Azure Functions AWS Lambda etc.)
Proficient in cloud networking concepts including VPCs subnets routing load balancing Firewalling (including WAF) and security groups.
Strong knowledge of cloud security best practices including identity and access management (IAM) encryption and security group configurations SAST & DAST tooling (CrowdStrike Prisma Cloud etc.)
Experience with cloud monitoring and logging tools like AWS CloudWatch EFK/ELK Stack OpenTelemetry Dynatrace Datadog New Relic etc.
Knowledge of diagramming tools and methodologies (e.g. Miro Figjam etc.)
Educated to Degree level in Computer Science or equivalent
Responsibilities
Help the Cloud Architectural Artifacts
Networking firewalling routing
Implementation level architecture for scalable IaC strategies (e.g. modularization etc.)
Platform Vs Service level
Contribute to and maintain our org wide Architecture Decision Records
Enable our Cloud Policies Procedures and Standards
Disaster Resiliency Recovery and Availability Planning
IaC tooling use and quality gating
Cloud technology stack/services consultancy and selection in accordance with technical and quality requirements (scalability performance security compliance)
Support our Cloud Operational Practices
Drive Ecommerce and Parts townwide projects
Cloud infrastructure TCO modeling and optimization
Capacity planning by continuously monitoring utilized resources and demand trends
Manage relationships with cloud service providers/vendors to ensure smooth operations and support.
Lead and ensure compliance with organizational security practices and auditing
Ensure maximal uptime of our infrastructure and services through monitoring alerting golden signals and on-call
support
Drive best in class cloud strategies
Knowledge transfer training mentoring of cloud practices across the engineering team
Experience Required: 8-10 years
Required Skills:
SRE
IT Services and IT Consulting