We are looking for you if:
- You have 5 years in software engineering / IT operations and/or software architect roles and 3 in SRE profiles or as Observability engineer
- You are knowledgeable about technology in all levels in the technology stack (from infrastructure to front-end from CI/CD to observability tooling) with expert knowledge
- Have hands-on experience on one or more levels (e.g. infrastructure & back-end development and/or observability & CI/CD tooling)
- You are experienced with ING Private cloud(if internal ING candidate) or public cloud (Azure or Google cloud) and related VM/container stacks & tooling; application-level technologies & tooling heavily in use at ING e.g. spring boot; Java ; Pega; Tibco; Lit-html; javascript
- Have proven knowledge and expertise on Observability landscape e.g. Prometheus / ELK stack / Distributed Tracing etc.
- You have experience with the Open Source way of working and an advocate for Observability engineering within ING.
- Have excellent oral and written English skills
You will get extra points for:
- Understanding of Software Architecture and are an expert Site Reliability engineer with good hands on experince on Application side observability.
- Ability to discuss with your stakeholders and product owner about features and stories and can translate these into software with real business value
- Strong analytical skills to discover patterns from Observability data and post mortems and create them into actionable intelligence.
- Ability to lead and inspire a group of senior developers on subject matter.
Your responsibilities:
Develops innovates matures & implements the Observability practices and related processes across ING in cooperation with the entity-level SRE teams and Observability Platform team with the overall purpose to improve reliability of our critical IT & business services. Ensures proper documentation training material and other ways to get the knowledge to our engineers across ING. Initial focus will be maturing INGs observability strategy & ensure implementation of best practices across ING.
- Operate in strong cooperation with involved Enterprise Architects entity SREs & engineers and the Observability Platform engineering team
- Focus will be on observability of our critical business services (critical chains/global services with biggest availability issues- getting everything in place from availability of the critical service chain monitoring/tracing to monitoring & alerting for critical components/applications)
- Supporting in shaping the observability landscape:Define and execute the long-term observability roadmap with stakeholders like architects and leadership aligning with the INGs objectives for system reliability scalability and operational efficiency
- Driving observability improvements to improve application observability byworking with the observability architect(s) and enabling visibility into complex distributed systems through improved observability
- Cross-Functional Leadership: Working closely with observability platform local SRE teams and application teams to embed observability into every layer of system design and development
- Maturing Observability landscape:Support the stream aligned feature teams to come to the right level of observability by assessing the organizational maturity including identifying the gaps in current delivery and needs
Information about the team:
The Observability Engineering team ensures observability through standardizing engineering practices and tool integration creating transparent observable systems based on their internal state. This transparency enables teams to respond promptly to prevent incidents and minimize MTTR if an incident occurs. Observability Engineering ensures systems operate smoothly and efficiently minimizing downtime and performance issues through proactive monitoring anomaly detection and capacity management. Additionally it ensures effective capacity planning building on capacity management and forecasting practices and provides valuable insights that inform strategic decisions and optimize operations with comprehensive metrics logs and traces