As a Site Reliability Engineer (SRE) specializing in SAP Technology you will focus on improving the reliability performance and scalability of SAP systems. This role centers on proactive incident prevention automation system observability and close collaboration with cross-functional teams to ensure a smooth and consistent user experience. You ll work alongside product teams architects and support staff to implement best practices in automation operational excellence and incident response within a dynamic multi-cloud environment.
RESPONSIBILITIES:
- Proactive Incident Analysis & Operational Improvements:
- Analyze incident patterns and trends to gain insights into recurring issues collaborating with product teams to drive their resolution.
- Proactively manage alerts identify potential problems and work with cross-functional teams to enhance reliability and performance.
- Collaborate with product teams to prioritize operational user stories focused on reliability and performance improvements.
- Document operational workflows and troubleshooting guides to support knowledge sharing and team efficiency.
- Complex Troubleshooting & Problem Management:
- Lead efforts to troubleshoot complex issues in collaboration with third and manufacturer partners ensuring swift resolution and minimal downtime.
- Participate in crisis management and response to address critical incidents impacting SAP systems.
- Automation for Efficiency:
- Identify automation opportunities across operational tasks to improve efficiency and reduce manual workload.
- Collaborate with the cybersecurity team to integrate automated security enhancements into systems operations and infrastructure.
- Observability and Monitoring:
- Use insights from observability tools to optimize incident resolution times improve application performance and drive continuous improvement.
- Cross-functional Collaboration:
- Work closely with architects and engineering teams to improve system stability and reduce incidents through proactive solutions. Propose ideas on solution designs and architectures that help improve application reliability or process simplification
- Engage with the Central SRE team SIAM (Service & Integration Management) manager and the SRE Community of Practice to share best practices and leverage synergies.
Requirements
Must have:
- Bachelors or masters degree in computer science Information Technology Engineering or related field.
- 5 years of hands-on experience in SAP Basis administration for SAP HANA systems
- 3 years of hands-on experience in administrating SAP systems running on Microsoft Azure.
- Experience in SAP BTP configuration and administration.
- Experience with Dynatrace or any other Observability system.
- Strong problem-solving skills and the ability to work under pressure during incidents.
- Excellent communication and collaboration skills with the ability to coordinate across cross-functional teams.
- Fluent in English with strong written and verbal communication skills.
Preferred:
- Relevant certifications in SAP
- Knowledge of SuSE Linux Enterprise (SLES) operating system or any other Unix-based operating system
- Experience in automation or scripting for operational efficiency and incident response.
- Familiarity with Agile methodologies and/or Safe Agile certification is strongly preferred.
- Understanding compliance and regulatory requirements - Understanding country-specific delivery constraints.