Key Responsibilities:
- Identify operational inefficiencies and automation opportunitieswithin monitoring workflows and infrastructure.
- Design and implement automated solutionsfor deployment configuration and scaling of monitoring tools using Infrastructure-as-Code (IaC) technologies such asTerraform Ansible Puppet or similar.
- Leverage REST APIsof platforms likeZabbix SolarWinds Prometheus and Grafanato streamline and standardize monitoring setup and management.
- Develop reusable automation assetsscripts templates and modulesto ensure consistent monitoring practices across diverse environments.
- Integrate monitoring systemswith alerting ticketing and reporting platforms to enable seamless incident management and visibility.
- Establish tagging strategies and observability standardsto ensure uniform data collection and traceability across services.
- Support incident responseby building automated diagnostics and enriching telemetry data for faster root cause analysis.
- Collaborate cross-functionallywith DevOps and SRE teams to align monitoring automation with CI/CD pipelines and operational goals.
Tech Skills:
Infrastructure as Code (IaC) & Automation
- Terraform
- Ansible
- Puppet
- Scripting languages: Python Bash PowerShell SSH
Monitoring & Observability Tools
- Zabbix
- SolarWinds
- Prometheus
- Grafana
- DatadogNew Relic orDynatrace(as alternatives or complementary tools)
API Integration & Automation
- Experience working withREST APIsfor automation and integration
- Familiarity withJSONYAML andHTTP methods(GET POST PUT DELETE)
CI/CD & DevOps Tooling
- JenkinsGitLab CIGitHub Actions or similar
- DockerandKubernetes(for containerized environments)
Alerting & Incident Management Integration
- ServiceNowJiraVictorOpsxMatters or similar
- Knowledge ofevent correlationandautomated diagnostics
Cloud Platforms (optional)
- AWSAzure orGoogle Cloud Platform
- Cloud-native monitoring tools likeCloudWatchAzure Monitor orGCP Operations Suite
Preferred Qualifications:
Soft Skills & Operational Mindset
- Strongproblem-solvingandgap analysiscapabilities
- Ability toidentify low-hanging fruitsfor automation
- Experience incross-functional collaboration(DevOps SRE IT Ops)
- Understanding ofobservability principlesandtagging strategies
Required Experience:
Staff IC