Junior role - 2-4 years of experience
Manager loves candidates who come from community colleges
Personality skill set is key! They will explore their career progression and their reasons for entering this field. People need to sell themselves well
Production Support Engineer Observability & Monitoring
Position Summary
We are seeking a proactive and detail-oriented Production Support Engineer Observability & Monitoring to help improve the reliability observability and operational readiness of our production systems. This role will focus on developing and maintaining effective monitoring strategies ensuring alerts are actionable and supporting incident response activities across enterprise applications.
The engineer will work closely with application infrastructure and operations teams to review existing monitoring practices improve alert accuracy and ensure proper documentation for operational response. of production issues. This role plays a key part in strengthening monitoring maturity reducing alert noise and enabling faster detection and resolution
Key Responsibilities
Continuously evaluate and refine alerts to reduce unnecessary alert frequency eliminate duplicate or noisy alerts and ensure each alert is tied to a meaningful operational action.
Monitor system alerts and respond to production incidents ensuring issues are properly tracked and managed through ServiceNow.
Support 24x7 production monitoring and incident response assisting with investigation and troubleshooting of system issues.
Partner closely with application platform and Database teams to improve alert accuracy and actionability by refining alert logic thresholds dependencies and improving telemetry coverage.
Analyze monitoring data logs and performance metrics to identify trends anomalies and potential system risks.
Assist in documenting operational procedures monitoring guidelines and incident response runbooks.
Participate in continuous improvement initiatives to strengthen monitoring standards and operational maturity.
Provide recommendations to enhance system visibility alert management and overall monitoring effectiveness.
Support post-incident reviews and help implement monitoring improvements based on lessons learned.
Required Qualifications
2 4 years of experience in production support monitoring or operational engineering roles.
Experience working with monitoring and observability tools such as application performance monitoring log analysis and infrastructure monitoring platforms including AppDynamics Splunk Grafana/Prometheus and Azure Monitor/Log Analytics.
Basic understanding of system health monitoring alert management and incident response processes.
Strong troubleshooting and analytical skills with the ability to investigate issues using monitoring data and system logs.
Ability to collaborate effectively with cross-functional teams including engineering infrastructure and support teams.
Strong documentation and communication skills.
Preferred Qualifications
Experience supporting enterprise production systems or large-scale applications.
Familiarity with monitoring frameworks and observability practices such as metrics logs and system performance analysis.
Exposure to cloud-based or distributed systems environments.
Understanding of Agile or DevOps-based operational practices.
Ability to identify monitoring gaps and recommend improvements to enhance system reliability.
Education
Bachelors degree in computer science Information Technology Engineering or equivalent relevant experience
Junior role - 2-4 years of experience Manager loves candidates who come from community colleges Personality skill set is key! They will explore their career progression and their reasons for entering this field. People need to sell themselves well Production Support Engineer Observability ...
Junior role - 2-4 years of experience
Manager loves candidates who come from community colleges
Personality skill set is key! They will explore their career progression and their reasons for entering this field. People need to sell themselves well
Production Support Engineer Observability & Monitoring
Position Summary
We are seeking a proactive and detail-oriented Production Support Engineer Observability & Monitoring to help improve the reliability observability and operational readiness of our production systems. This role will focus on developing and maintaining effective monitoring strategies ensuring alerts are actionable and supporting incident response activities across enterprise applications.
The engineer will work closely with application infrastructure and operations teams to review existing monitoring practices improve alert accuracy and ensure proper documentation for operational response. of production issues. This role plays a key part in strengthening monitoring maturity reducing alert noise and enabling faster detection and resolution
Key Responsibilities
Continuously evaluate and refine alerts to reduce unnecessary alert frequency eliminate duplicate or noisy alerts and ensure each alert is tied to a meaningful operational action.
Monitor system alerts and respond to production incidents ensuring issues are properly tracked and managed through ServiceNow.
Support 24x7 production monitoring and incident response assisting with investigation and troubleshooting of system issues.
Partner closely with application platform and Database teams to improve alert accuracy and actionability by refining alert logic thresholds dependencies and improving telemetry coverage.
Analyze monitoring data logs and performance metrics to identify trends anomalies and potential system risks.
Assist in documenting operational procedures monitoring guidelines and incident response runbooks.
Participate in continuous improvement initiatives to strengthen monitoring standards and operational maturity.
Provide recommendations to enhance system visibility alert management and overall monitoring effectiveness.
Support post-incident reviews and help implement monitoring improvements based on lessons learned.
Required Qualifications
2 4 years of experience in production support monitoring or operational engineering roles.
Experience working with monitoring and observability tools such as application performance monitoring log analysis and infrastructure monitoring platforms including AppDynamics Splunk Grafana/Prometheus and Azure Monitor/Log Analytics.
Basic understanding of system health monitoring alert management and incident response processes.
Strong troubleshooting and analytical skills with the ability to investigate issues using monitoring data and system logs.
Ability to collaborate effectively with cross-functional teams including engineering infrastructure and support teams.
Strong documentation and communication skills.
Preferred Qualifications
Experience supporting enterprise production systems or large-scale applications.
Familiarity with monitoring frameworks and observability practices such as metrics logs and system performance analysis.
Exposure to cloud-based or distributed systems environments.
Understanding of Agile or DevOps-based operational practices.
Ability to identify monitoring gaps and recommend improvements to enhance system reliability.
Education
Bachelors degree in computer science Information Technology Engineering or equivalent relevant experience
View more
View less