What youll do
- Lead incident response and postmortems drive investigations document learnings and implement permanent fixes to prevent recurrence.
- Manage and optimize Azure Kubernetes environments own cluster configurations performance cost control and security best practices.
- Build observability systems develop dashboards alerts and metrics using Dynatrace Honeycomb ElasticSearch Grafana/Kibana and Azure Monitor (KQL).
- Automate for resilience write reliable scripts in PowerShell Bash Python or C# embedding logging rollback and version control.
- Implement Infrastructure-as-Code design and maintain Terraform Bicep or ARM templates to standardize and automate deployments.
- Optimize system performance identify bottlenecks through deep monitoring dump analysis and right-sizing of cloud resources.
- Collaborate across engineering teams integrate reliability principles into CI/CD pipelines and the broader SDLC.
- Participate in on-call rotations lead during critical incidents ensuring lasting fixes and operational excellence.
Qualifications :
What youll bring
- 5 years in SRE DevOps or Cloud Infrastructure roles with experience in large-scale distributed systems.
- Strong Azure and Kubernetes expertise (production-level).
- Proven ability in observability engineering using Dynatrace Honeycomb Elastic Grafana/Kibana or Azure Monitor.
- Skilled in PowerShell Bash Python or C# with an automation-first mindset.
- Proficient in Infrastructure-as-Code (Terraform Bicep ARM).
- Solid grasp of TCP/IP networking fundamentals and performance tuning.
- Strong communicator able to translate complex technical findings into clear actionable insights.
- Certifications preferred:
- Microsoft Certified: Azure Administrator Associate
- Certified Kubernetes Administrator (CKA)
Why youll love working here
- Impact from day one Join a scale-up where your ideas shape how global businesses operate online.
- Continuous learning Access a structured onboarding rated 9.1/10 by previous hires mentorship and feedback culture.
- Hybrid flexibility Work from our Cape Town office 3 days per week and from home 2 days.
- Career growth Expand your technical and leadership scope in a company built for long-term success.
Our values
At Sana Commerce our values drive everything we do:
- Champions of Our League We deliver lasting success balancing quick wins and long-term value.
- Supercharge Our Customers We help our customers lead and succeed.
- Determined to Grow We embrace feedback and challenges to raise the bar.
- Bold Together We take risks collaborate deeply and support each other.
Ready to build reliability that scales
Apply now and help shape the foundation of our next-generation SaaS platform.
Additional Information :
#LI-Hybrid
Remote Work :
No
Employment Type :
Full-time
What youll doLead incident response and postmortems drive investigations document learnings and implement permanent fixes to prevent recurrence.Manage and optimize Azure Kubernetes environments own cluster configurations performance cost control and security best practices.Build observability system...
What youll do
- Lead incident response and postmortems drive investigations document learnings and implement permanent fixes to prevent recurrence.
- Manage and optimize Azure Kubernetes environments own cluster configurations performance cost control and security best practices.
- Build observability systems develop dashboards alerts and metrics using Dynatrace Honeycomb ElasticSearch Grafana/Kibana and Azure Monitor (KQL).
- Automate for resilience write reliable scripts in PowerShell Bash Python or C# embedding logging rollback and version control.
- Implement Infrastructure-as-Code design and maintain Terraform Bicep or ARM templates to standardize and automate deployments.
- Optimize system performance identify bottlenecks through deep monitoring dump analysis and right-sizing of cloud resources.
- Collaborate across engineering teams integrate reliability principles into CI/CD pipelines and the broader SDLC.
- Participate in on-call rotations lead during critical incidents ensuring lasting fixes and operational excellence.
Qualifications :
What youll bring
- 5 years in SRE DevOps or Cloud Infrastructure roles with experience in large-scale distributed systems.
- Strong Azure and Kubernetes expertise (production-level).
- Proven ability in observability engineering using Dynatrace Honeycomb Elastic Grafana/Kibana or Azure Monitor.
- Skilled in PowerShell Bash Python or C# with an automation-first mindset.
- Proficient in Infrastructure-as-Code (Terraform Bicep ARM).
- Solid grasp of TCP/IP networking fundamentals and performance tuning.
- Strong communicator able to translate complex technical findings into clear actionable insights.
- Certifications preferred:
- Microsoft Certified: Azure Administrator Associate
- Certified Kubernetes Administrator (CKA)
Why youll love working here
- Impact from day one Join a scale-up where your ideas shape how global businesses operate online.
- Continuous learning Access a structured onboarding rated 9.1/10 by previous hires mentorship and feedback culture.
- Hybrid flexibility Work from our Cape Town office 3 days per week and from home 2 days.
- Career growth Expand your technical and leadership scope in a company built for long-term success.
Our values
At Sana Commerce our values drive everything we do:
- Champions of Our League We deliver lasting success balancing quick wins and long-term value.
- Supercharge Our Customers We help our customers lead and succeed.
- Determined to Grow We embrace feedback and challenges to raise the bar.
- Bold Together We take risks collaborate deeply and support each other.
Ready to build reliability that scales
Apply now and help shape the foundation of our next-generation SaaS platform.
Additional Information :
#LI-Hybrid
Remote Work :
No
Employment Type :
Full-time
View more
View less