Site ReliabilityDevOps Engineer (Contract) GautengHybrid ISB5303336

ISanqa Resourcing

Not Interested
Bookmark
Report This Job

profile Job Location:

Midrand - South Africa

profile Monthly Salary: Not Disclosed
Posted on: 8 days ago
Vacancies: 1 Vacancy

Job Summary

Our client is seeking an Expert Reliability Engineer to take complete ownership of the availability performance and scalability of Retail Sales Services.

This product is the backbone for dealers agents and NSCs globally facilitating vehicle configuration contract management and stock searches for all brands.

Key Highlights

Expert Level: 10 15 years of hands-on experience in DevOps Infrastructure and SRE.

Infrastructure Mastery: Deep expertise in AWS Terraform Kubernetes and CI/CD.

Operational Excellence: High-stakes responsibility for the reliability of global retail applications.

Work-Life Balance: High flexibility with a 1960-hour annual model and hybrid work options.

Position Details

  • Contract Duration: 01 June 2026 31 December 2028.
  • Location: Hybrid Midrand / Menlyn / Rosslyn / Home Office rotation.
  • Role Level: Expert.
  • Experience Required: 10 to 15 years.

Qualifications & Experience

  • Degree in Computer Science IT or equivalent practical experience.
  • 10 to 15 years of hands-on experience in SRE/DevOps and cloud operations.
  • Relevant Certifications: AWS Solutions Architect
  • AWS DevOps Engineer
  • CKA/CKAD
  • Terraform Associate (Highly Advantageous).

Essential Skills & Technologies

  • Cloud Infrastructure & Operations
  • Deep hands-on experience with AWS (EC2 ECS/EKS RDS S3 VPC IAM CloudFront Lambda)
  • Proven ability to design build and operate scalable highly available cloud infrastructure
  • Strong experience with infrastructure-as-code (Terraform preferred CloudFormation)
  • Solid experience with containerization and orchestration (Docker Kubernetes)
  • Hands-on experience with cloud cost optimization - you know how to cut waste and right-size resources
  • Experience with capacity planning scaling strategies and performance tuning
  • CI/CD & Automation
  • Deep experience building and maintaining CI/CD pipelines (GitHub Actions preferred Jenkins)
  • Strong automation mindset - you automate everything that can be automated
  • Experience with build tools and artifact management (Maven Gradle GitHub Packages ECR)
  • Proficiency in scripting and tooling (Bash Python) to solve real operational problems
  • Experience with GitOps workflows and deployment automation
  • Security & Compliance
  • Hands-on experience implementing security controls in CI/CD pipelines (SAST DAST dependency scanning)
  • Knowledge of container security best practices (image scanning runtime security least-privilege)
  • Experience with IAM policies network security secrets management (AWS Secrets Manager Vault)
  • Familiarity with compliance frameworks and ability to translate security requirements into implementations
  • Pragmatic approach to security - you find the right balance between security and velocity
  • Monitoring Observability & Incident Response
  • Experience with monitoring and alerting solutions (Prometheus Grafana CloudWatch ELK/OpenSearch)
  • Ability to build meaningful dashboards that provide real operational insight
  • Strong troubleshooting and incident response skills - you stay calm under pressure and fix things fast
  • Experience with post-incident root cause analysis and driving corrective actions
  • Mindset & Way of Working
  • Pragmatic doer - you bias towards action and delivering results over endless discussions
  • Comfortable working in agile teams (Scrum/SAFe) alongside developers architects and product owners
  • Strong sense of ownership - if its broken its your problem until its fixed
  • Ability to prioritize ruthlessly - you know what matters and focus on high-impact work
  • Clear communicator who can explain technical decisions to non-technical stakeholders

Advantageous Skills:

  • Experience with ITSM processes (Incident Problem Change) and tools like ServiceNow
  • Experience with database operations and performance tuning (PostgreSQL MySQL MongoDB)
  • Knowledge of service mesh technologies (e.g. Istio)
  • Experience with chaos engineering or resilience testing
  • Familiarity with FinOps practices and cloud cost governance at scale
  • Experience with Technical Lifecycle Management (TLM) - upgrades deprecations migrations
  • Knowledge of AI-assisted DevOps tools and willingness to adopt AI4DevOps practices
  • Familiarity with Jira and Confluence for tracking and documentation
  • Experience in automotive or enterprise-scale environments

Key Responsibilities

  • Infrastructure & Cloud (35%)
  • Design build and maintain scalable secure and cost-efficient cloud infrastructure on AWS
  • Manage and evolve Kubernetes clusters including upgrades capacity planning and cluster health
  • Build and maintain infrastructure-as-code modules for repeatable auditable deployments
  • Drive cloud cost optimization - identify waste right-size resources implement savings plans
  • Ensure infrastructure meets non-functional requirements: performance scalability availability Disaster Recovery
  • CI/CD & Automation (25%)
  • Build operate and continuously improve CI/CD pipelines for fast safe and reliable delivery
  • Automate repetitive operational tasks and reduce toil through tooling and runbooks
  • Maintain and improve deployment automation - zero-touch deployments are the goal
  • Drive adoption of best practices across development teams
  • Own deployment runbooks and ensure they are up to date and tested
  • Security Implementation (20%)
  • Implement and maintain security scanning in CI/CD pipelines (SAST DAST container image scanning)
  • Harden container and cloud infrastructure security (network policies IAM secrets encryption)
  • Translate security audit findings into concrete technical actions and execute them
  • Drive vulnerability remediation - track prioritize and fix security issues with urgency
  • Ensure compliance with Group IT security standards and policies
  • Monitoring Reliability & Incident Response (15%)
  • Implement and own monitoring logging and alerting for proactive issue detection
  • Build dashboards that give real-time visibility into system health and performance
  • Lead incident response for infrastructure-related issues - diagnose fast fix fast
  • Conduct post-incident reviews and drive corrective actions to prevent recurrence
  • Continuously improve system reliability uptime and mean time to recovery (MTTR)
  • Technical Optimization & Lifecycle Management (5%)
  • Drive Technical Lifecycle Management (TLM) - plan and execute upgrades and migrations
  • Identify and implement technical optimizations across the stack
  • Contribute to technical strategy and roadmap for platform engineering
  • Actively use and promote AI4DevOps tools and practices where they add real value
  • What Does Success Look Like
  • Infrastructure Reliable scalable cost-optimized - no surprises
  • CI/CD Fast safe pipelines - developers ship with confidence
  • Security Vulnerabilities found early fixed fast - no excuses
  • Incidents Quick response thorough root cause things get better over time
  • Automation If you did it twice manually the third time its automated
  • Delivery You ship improvements continuously - not just plans but results
  • We dont need someone who writes documents about how things should be done. We need someone who rolls up their sleeves and makes things better - every single day.

NB:

  • South African citizens/residents are preferred.
  • Applicants with valid work permits will also be considered.
  • By applying you consent to being added to the database and receiving updates until you unsubscribe.
  • If you do not receive a response within 2 weeks please consider your application unsuccessful.

#iSanqa #Group #ReliabilityEngineer #SRE #DevOps #AWS #Kubernetes #Terraform #InfrastructureEngineering #GautengJobs #HybridWork #HiringNow

iSanqa is your trusted Level 2 BEE recruitment partner dedicated to continuous improvement in delivering exceptional service. Specializing in seamless placements for permanent staff temporary resources and efficient contract management and billing facilitation iSanqa Resourcing is powered by a team of professionals with an outstanding track record. With over 100 years of combined experience we are committed to evolving our practices to ensure ongoing excellence.

Our client is seeking an Expert Reliability Engineer to take complete ownership of the availability performance and scalability of Retail Sales Services. This product is the backbone for dealers agents and NSCs globally facilitating vehicle configuration contract management and stock searches for a...
View more view more