Cloud Engineer (Observability)
Job Summary
Role Summary
Are you passionate and self-driven information technology (IT) expert who loves to design and implement Observability solutions that empowers both IT and Business alike Are you driven to ensure systems and applications are highly observability and Failures are detected promptly and automations kick-in instead of manual dependencies Have you managed smooth operations of critical systems and software applications in Production environment
Were looking for an Observability Architect to:
- Design and implement observability solutions for IT systems and applications.
- Coordinate with IT teams to understand observability requirements.
- Diagnose identify and execute steps to improve platform observability for variety of Applications.
- Create maintain and update system documentation and operational procedures related to observability.
- Monitor infrastructure alerts and create anomaly detection patterns.
- Create Reusable Observability implementations that can be scaled out to multiple Applications / scenarios there by improving observability maturity & reduction of incidents.
- Take ownership and resolve open requirements for observability patterns
- Define Observability Taxonomy and thereby re-usable templates for adoption of the established taxonomy.
- Identify and implement automation opportunities to enhance observability & recoverability.
Your team
We are building a Observability Platform which would not only provide IT insights but also prove fruitful for Business to understand business impact of IT issues. The Platform aims to build and enhance maturity in terms of Monitoring and Observability for the LOB
Your expertise
7-10 years of experience as an Observability Architect or similar role within the financial sector.
Expertise in designing and implementing observability platform for large systems and applications that are geographically distributed.
Good understanding of Cloud (Azure Preferred) services including IaaS PaaS and SaaS services and networking concepts.
Good understanding of version control tool like GitLab and CI/CD process
Aware of System / Design thinking concepts.
Good understanding of Incident and Problem Management process and integrations.
Usage of AIOPS as BAU and SRE would be beneficial.
Good problem solving and analytical skills with an ability to break down the problems and identify resolution steps
Ability to partner with other IT and business teams in the context of issue resolution. Comfortable working in a team with diverse backgrounds and cultures in a distributed team environment
Familiarity with cloud-native observability solutions.
Preferred Hands-On experience with below
- Experience with monitoring and observability tools such as Prometheus Grafana and ELK stack.
- Knowledge of containerization technologies like Docker and Kubernetes.
- Experience with automation tools like Ansible Terraform or similar.
Proficiency with scripting language like PowerShell / Bash / Python
Proficiency with REST API and Batch loading / integration techniques.
Proficiency with programming languages from C# Java Golang.
Proficiency with databases and their observability e.g. SQL / PostgreSQL
- Strong understanding of network protocols and security best practices.
- Certification in relevant technologies or frameworks.