drjobs OM Bank: Site Reliability Engineer

OM Bank: Site Reliability Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Johannesburg - South Africa

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Lets Write Africas Story Together!

Old Mutual is a firm believer in the African opportunity and our diverse talent reflects this.

Job Description

The Site Reliability Engineer will be responsible for ensuring the reliability scalability and performance of our digital banking infrastructure. You will work closely with software engineers Platform engineers and security team to proactively prevent issues resolve incidents and optimise system health.

This role requires a mix of technical expertise automation skills and operational discipline to deliver high availability and performance for critical banking services.

KEY RESULT AREAS

Reliability and Performance Monitoring

  • Implement and maintain monitoring and alerting systems to track key performance indicators (KPIs) for uptime latency and system health.
  • Define and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to ensure critical systems meet reliability standards.

Incident Management

  • Participate in on-call rotations and lead response efforts to quickly resolve system incidents minimizing downtime and customer impact.
  • Conduct root cause analysis for incidents create post-incident reports and implement corrective actions.

Automation and Infrastructure as Code (IaC)

  • Develop and maintain automation scripts and tools to streamline operational tasks including incident response and system provisioning.
  • Implement Infrastructure as Code (e.g. Terraform Ansible) to manage and scale infrastructure reliably and repeatably.

Continuous Improvement of CI/CD Pipelines

  • Collaborate with Platform engineering team to enhance CI/CD pipelines reducing deployment time and improving stability.
  • Implement canary and blue-green deployments rollbacks and automated testing to ensure reliable releases.

Capacity Planning and Scalability

  • Analyze system performance and usage patterns to anticipate growth needs ensuring infrastructure is prepared for peak traffic and future scaling.
  • Conduct capacity planning and make recommendations for resource allocation balancing performance and cost.

Observability and Logging

  • Implement and maintain observability tools (e.g. Prometheus Grafana ELK Stack) to gain insights into system behavior and proactively identify issues.
  • Ensure that logging metrics and traceability are set up to enable comprehensive debugging and troubleshooting.

Disaster Recovery and Business Continuity

  • Contribute to the design and testing of disaster recovery plans to ensure fast recovery of critical services in the event of major incidents.
  • Regularly test backup and recovery processes to ensure data integrity and system continuity.

Security and Compliance Collaboration

  • Work with the security team to ensure compliance with banking regulations (e.g. PCI-DSS GDPR POPIA) and implement security best practices in system design.
  • Monitor for and respond to security alerts to maintain a secure infrastructure.

Key Performance Indicators (KPIs):

  • System Uptime (Availability): Aim for 99.95% or higher for critical systems.
  • Mean Time to Recovery (MTTR): Target rapid resolution times for incidents (e.g. under 30 minutes).
  • Incident Volume and Severity: Reduction in the number and severity of incidents over time.
  • Change Failure Rate: Percentage of changes that result in incidents aiming to keep it below 5%.
  • Automated Task Percentage: Proportion of operational tasks automated improving efficiency and reducing manual errors.

ROLE REQUIREMENTS

  • Educational Background: Bachelors degree in Computer Science Engineering or a related field or equivalent work experience.

Preferred Qualifications:

  • Previous experience in the financial services industry or within regulated environments.
  • Certification in relevant areas such as AWS Certified SysOps Administrator Certified Kubernetes Administrator (CKA)
  • Familiarity with security standards and practices relevant to financial services.

Experience:

  • 3 years of experience in Site Reliability Engineering Platform engineering or a related role ideally within a high-availability or financial services environment.

Technical Skills:

  • Proficiency in monitoring and observability tools (e.g. Prometheus Grafana ELK Datadog).
  • Strong scripting skills (Python Bash or similar) and experience with automation tools.
  • Familiarity with cloud platforms (AWS GCP Azure) and container orchestration tools (Docker Kubernetes).
  • Experience with CI/CD tools and practices (e.g. Github actions GitLab CI/CD ArgoCD).
  • Proficient in Infrastructure as Code (IaC) tools like Terraform or CloudFormation.

Soft Skills:

  • Strong problem-solving skills with the ability to troubleshoot complex systems under pressure.
  • Excellent communication and teamwork skills with the ability to work cross-functionally with engineering Platform engineering and security teams.
  • A proactive mindset focused on reliability and continuous improvement.

Skills

Action Planning Application Development Business Process Design Computer Literacy Data Management Data Modeling Evaluating Information Identifying Customer Needs Information Technology (IT) Support Market Analysis Oral Communications Product Development Technical Support Technical Troubleshooting Test Case Management User Requirements Documentation Web Development

Competencies

Business Insight

Collaborates

Courage

Cultivates Innovation

Decision Quality

Drives Results

Ensures Accountability

Manages Complexity

Education

Closing Date

14 July 2025 23:59

The appointment will be made from the designated group in line with the Employment Equity Plan of Old Mutual South Africa and the specific business unit in question.

The Old Mutual Story!

Employment Type

Full-Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.