Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailJob Summary
We are seeking an experienced SRE Engineer with strong expertise in application and infrastructure monitoring observability and automation. The role involves building and maintaining monitoring solutions with Nagios and modern observability tools (AppDynamics Splunk Dynatrace New Relic Prometheus Grafana ELK etc.) automating operational tasks with Python/Shell and contributing to Jenkins pipeline development. The ideal candidate is proactive analytical and passionate about ensuring performance stability and reliability in production environments.
Key Responsibilities Monitoring & Observability
Implement and manage Nagios monitoring (servers services networks applications).
Build custom Nagios plugins and alerts for proactive issue detection.
Deploy and optimize observability solutions (APM logging tracing metrics/dashboards)
Integrate multiple tools for end-to-end system visibility and reliable alerting. Automation & CI/CD
Automate provisioning deployments and incident response with Python/Shell scripting.
Integrate monitoring with ticketing/ops platforms.
Contribute to Jenkins pipeline development for automation and CI/CD. Performance & Troubleshooting
Analyze monitoring data to detect performance issues and optimize systems.
Collaborate with Dev/Ops teams to troubleshoot complex problems.
Participate in on-call rotations for incident response. Linux Administration
Perform system hardening patching performance tuning and troubleshooting.
Apply best practices to ensure system security reliability and scalability. Continuous Improvement
Maintain documentation for monitoring setups automation and processes.
Advocate best practices and stay current with monitoring/observability trends. Required Skills & Qualifications
8 years in SRE Linux Administration or Monitoring Engineering.
Strong expertise in Linux internals (kernel networking security performance).
Hands-on experience with Nagios (setup configuration plugins alerts).
Experience with at least two observability domains: APM Logging Tracing Metrics.
Strong Python/Shell scripting for automation and integrations.
Experience with Git and collaborative troubleshooting.
Excellent communication and teamwork skills. Preferred Skills
Additional observability tools (Dynatrace New Relic Prometheus Grafana).
Experience with cloud platforms (AWS Azure GCP) and Kubernetes/Docker.
Familiarity with CI/CD pipelines (Jenkins GitHub Actions GitLab CI)
Full-time