Site Reliability Engineer

Trigent Software Private Limited

Job Location:

Bengaluru - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Job Description

Job Summary

We are seeking an experienced SRE Engineer with strong expertise in application and infrastructure monitoring observability and automation. The role involves building and maintaining monitoring solutions with Nagios and modern observability tools (AppDynamics Splunk Dynatrace New Relic Prometheus Grafana ELK etc.) automating operational tasks with Python/Shell and contributing to Jenkins pipeline development. The ideal candidate is proactive analytical and passionate about ensuring performance stability and reliability in production environments.

Key Responsibilities Monitoring & Observability

Implement and manage Nagios monitoring (servers services networks applications).

Build custom Nagios plugins and alerts for proactive issue detection.

Deploy and optimize observability solutions (APM logging tracing metrics/dashboards)

Integrate multiple tools for end-to-end system visibility and reliable alerting. Automation & CI/CD

Automate provisioning deployments and incident response with Python/Shell scripting.

Integrate monitoring with ticketing/ops platforms.

Contribute to Jenkins pipeline development for automation and CI/CD. Performance & Troubleshooting

Analyze monitoring data to detect performance issues and optimize systems.

Collaborate with Dev/Ops teams to troubleshoot complex problems.

Participate in on-call rotations for incident response. Linux Administration

Perform system hardening patching performance tuning and troubleshooting.

Apply best practices to ensure system security reliability and scalability. Continuous Improvement

Maintain documentation for monitoring setups automation and processes.

Advocate best practices and stay current with monitoring/observability trends. Required Skills & Qualifications

8 years in SRE Linux Administration or Monitoring Engineering.

Strong expertise in Linux internals (kernel networking security performance).

Hands-on experience with Nagios (setup configuration plugins alerts).

Experience with at least two observability domains: APM Logging Tracing Metrics.

Strong Python/Shell scripting for automation and integrations.

Experience with Git and collaborative troubleshooting.

Excellent communication and teamwork skills. Preferred Skills

Additional observability tools (Dynatrace New Relic Prometheus Grafana).

Experience with cloud platforms (AWS Azure GCP) and Kubernetes/Docker.

Familiarity with CI/CD pipelines (Jenkins GitHub Actions GitLab CI)

Job Description Job Summary We are seeking an experienced SRE Engineer with strong expertise in application and infrastructure monitoring observability and automation. The role involves building and maintaining monitoring solutions with Nagios and modern observability tools (AppDynamics Splunk Dy...