Lead Support Analyst Shared Services and Production Management , Information Technology (MNC Institutional Brokerage & Investment Services)
Posted on:
30+ days ago
Vacancies:
1 Vacancy
Job Summary
Key Areas of Responsibilities
- Own and support monitoring and SRE operations ensuring system reliability availability and performance.
- Build enhance and maintain monitoring solutions using ITRS Geneos Prometheus VictoriaMetrics Elasticsearch and Grafana.
- Develop optimize and maintain alerting rules dashboards and observability pipelines.
- Troubleshoot and resolve complex issues during major incidents providing clear and timely communication.
- Troubleshoot Linux servers (RHEL 7/8/9) including upgrades configurations patching and maintenance while determining appropriate monitoring requirements for system changes.
- Analyze logs investigate issues and perform fault finding to identify performance exceptions.
- Collaborate with engineering application and infrastructure teams to improve system resilience stability security efficiency and scalability.
- Contribute to automation strategies deployment processes and continuous operational improvements.
- Participate in oncall rotations including offhours and scheduled weekend support.
- Participate in Disaster Recovery (DR) and Business Continuity Planning (BCP) drills.
- Continuously research and adopt modern monitoring and SRE tools and practices.
Requirements
- Strong experience with monitoring and observability platforms including: ITRS Geneos Prometheus VictoriaMetrics Elasticsearch Grafana and Kibana.
- Hands-on experience building and implementing Prometheus pipelines including exporters scraping configurations relabelling metric routing and integrations with longterm storage (e.g. VictoriaMetrics).
- Experience building and maintaining Logstash pipelines including ingestion parsing filtering enrichment and routing of logs into Elasticsearch.
- Ability to design build and maintain Grafana and Kibana dashboards for metrics logs and performance analytics across distributed systems.
- Solid understanding of metrics logging alerting dashboards and observability pipelines.
- Strong Linux administration skills (RHEL 7/8/9) including troubleshooting upgrades configuration patching and performance optimization.
- Good understanding of SRE principles high availability scalability incident management and DR (Disaster Recovery) / BCP (Business Continuity Planning) activities
- Experience with automation (e.g. Bash Python Ansible CI/CD tools) is an advantage.
- Understanding of networking fundamentals performance tuning and troubleshooting distributed systems.
- Prior experience in Production Support SRE Monitoring Engineering or Shared Services Operations with participation in oncall rotations including after-hours and weekend support.
- Strong analytical problemsolving and communication skills with the ability to work collaboratively under pressure.
- Self-motivated adaptable and able to prioritize learn continuously and manage multiple responsibilities effectively.
Candidates profile:
- Bachelors degree in computer science / engineering
- Minimum 8 years experience within IT / Investment bank.
- Excellent/Fluent in English
Required Skills:
SRE OperationsITRSGeneosPrometheusVictoriaMetrics