JOB DESCRIPTION
Monitor Windows and Linux servers using enterprise-monitoring tools (e.g. Zabbix Nagios Solar Winds etc.).
Proactively identify and respond to service disruptions and performance issues. Track and report the health status of critical infrastructure components including servers storage and network systems.
Perform basic troubleshooting and service restarts for Apache Tomcat and Java applications. Execute health checks log analysis and event correlation to detect early signs of potential incidents.
Escalate issues to respective L2/L3 teams based on severity and impact. Maintain and update monitoring configurations alerting thresholds and dashboards. Document incidents solutions and monitoring run books.
Participate in an on-call rotation to provide 24/7 monitoring support for business-critical systems. Assist in capacity planning and proactive improvements to system performance and resilience.
Required Skills & Qualifications
Strong technical knowledge of Windows Server and Linux (RHEL/CentOS/Ubuntu) environments.
Ability to understand and restart system services and daemons.
Basic understanding of system logs process management and troubleshooting commands.
Experience with one or more infrastructure monitoring tools (e.g. Zabbix Nagios Prometheus) Good understanding of networking basics (IP DNS latency firewall basics).
Knowledge of scripting (Bash/PowerShell) is a plus. Strong problem-solving skills and a proactive attitude.
JOB DESCRIPTION Monitor Windows and Linux servers using enterprise-monitoring tools (e.g. Zabbix Nagios Solar Winds etc.). Proactively identify and respond to service disruptions and performance issues. Track and report the health status of critical infrastructure components including servers ...
JOB DESCRIPTION
Monitor Windows and Linux servers using enterprise-monitoring tools (e.g. Zabbix Nagios Solar Winds etc.).
Proactively identify and respond to service disruptions and performance issues. Track and report the health status of critical infrastructure components including servers storage and network systems.
Perform basic troubleshooting and service restarts for Apache Tomcat and Java applications. Execute health checks log analysis and event correlation to detect early signs of potential incidents.
Escalate issues to respective L2/L3 teams based on severity and impact. Maintain and update monitoring configurations alerting thresholds and dashboards. Document incidents solutions and monitoring run books.
Participate in an on-call rotation to provide 24/7 monitoring support for business-critical systems. Assist in capacity planning and proactive improvements to system performance and resilience.
Required Skills & Qualifications
Strong technical knowledge of Windows Server and Linux (RHEL/CentOS/Ubuntu) environments.
Ability to understand and restart system services and daemons.
Basic understanding of system logs process management and troubleshooting commands.
Experience with one or more infrastructure monitoring tools (e.g. Zabbix Nagios Prometheus) Good understanding of networking basics (IP DNS latency firewall basics).
Knowledge of scripting (Bash/PowerShell) is a plus. Strong problem-solving skills and a proactive attitude.
View more
View less