Senior Observability Specialist
Johannesburg - South Africa
Job Summary
Why This Role Matters
In a world where milliseconds matter and reliability is non-negotiable observability is the superpower that keeps enterprise systems alive agile and trusted.
As our Senior Observability Specialist you will architect the nervous system of our digital estate bringing clarity to complexity across hybrid cloud distributed platforms and mainframe environments. Your work will ensure incidents are prevented before they happen detected faster than ever and resolved with confidence and speed.
You wont just monitor systems you will illuminate them championing Site Reliability Engineering (SRE) principles automation-first thinking and a truly measure-first culture.
Your Mission
To deliver enterprise-grade observability solutions that provide real-time insights into system health performance and resilienceusing best-in-class platforms like Dynatrace and ServiceNow.
Youll partner with technology and business leaders to design scalable integrated monitoring ecosystems that empower teams reduce noise and turn data into decisive action.
What Youll Be Doing
Design & Build World-Class Observability
- Architect and implement end-to-end observability across applications infrastructure and services
- Define and set enterprise standards for monitoring and event management
- Instrument services using logs metrics traces telemetry APM RUM and synthetics
- Design synthetic user journeys for proactive early-warning detection
- Build high-signal alerts and visually compelling dashboards that matter
- Ensure scalability availability and recoverability across hybrid cloud environments
Integrate Automate Accelerate
- Integrate observability platforms with ServiceNow Event Management
- Design event normalization enrichment CI binding and noise reduction strategies
Drive Operational Excellence
- Strengthen operational readiness with clear procedures and response plans
- Partner with operations teams to close monitoring gaps and refine incident response
Champion Continuous Improvement
- Research emerging observability tools trends and practices
- Design advanced cross-technology observability solutions at enterprise scale
Lead Influence & Enable
- Serve as an observability and SRE consultant across enterprise initiatives
- Support SRE practices including SLIs SLOs error budgets and post-incident reviews
Experience & Expertise
- 810 years of IT experience with 710 years in monitoring APM observability or SRE at enterprise scale
- Deep hands-on expertise with Dynatrace ServiceNow APM/Event Management or equivalent platforms
- Strong background in event management design ITSM integration and automation
- Proven experience designing SLIs SLOs alert strategies and driving MTTR reduction
- Production experience in AWS Azure or GCP including Kubernetes and containerized environments
Technical Strengths
- Strong programming logic and scripting expertise
- Ability to design solutions that span complex interdependent technologies
- Solid understanding of hybrid cloud and distributed system architectures
Qualifications
- IT Diploma / Degree or equivalent practical experience
The Way You Work
- Strategic thinker who sees the big picture while mastering the details
- Calm under pressure leading incidents with clarity and confidence
- Able to influence without authority coaching teams to shift-left and own their SLOs
- Collaborative across all organizational levels
- Emotionally intelligent decisive resilient and quality-driven
The Impact Youll Make
Youll redefine how reliability is engineered how incidents are experienced and how visibility empowers teams. Your influence will be felt across platforms portfolios and people turning observability into a competitive advantage.
Please contact the Nedbank Recruiting Team at
Required Experience:
Senior IC
Key Skills
About Company
Stay connected to your wealth and us via our secure Nedbank Private Wealth site. You can transact and check your latest balances.