Location: Harrisburg PA
Position Type: Hyrbid
Hybrid Schedule: At least one day onsite a week
Contract Length: Long-term with annual extensionsPosition Overview:This role serves as a subject matter expert responsible for enterprise monitoring observability and automation initiatives that improve operational visibility service reliability and incident response. The position focuses on modernizing monitoring processes through automation standardized workflows and strong IT service management practices across hybrid infrastructure environments.
Required Skills:5years of experience in IT infrastructure monitoring automation and observability within hybrid environments.
Strong proficiency in PowerShell and at least one additional scripting language such as Python SQL or Bash.
Hands-on experience using Azure Monitor Log Analytics Ansible SQL and KQL for monitoring and analytics.
Experience implementing automation solutions using Azure Automation and CI/CD pipelines.
Expertise working with enterprise monitoring platforms such as SCOM SquaredUp or equivalent tools including Dynatrace Datadog or Splunk.
Knowledge of API integrations and secure authentication methods.
Experience utilizing ServiceNow or similar IT Service Management (ITSM) platforms.
Preferred Skills:Duties:Drive process and tooling improvements by identifying operational gaps and implementing automation-first solutions to reduce manual effort and enhance service quality.
Maintain endpoint monitoring connectivity by managing telemetry ingestion through agents SNMP WMI APIs and secure credential and certificate administration.
Develop maintain and organize documentation including runbooks SOPs service maps and workflows within version-controlled repositories.
Document incidents and problems using monitoring and observability data produce post-incident reviews and maintain a Known Error Database.
Collaborate with change incident and problem management teams to ensure standardized processes risk assessments and communication plans are followed.
Monitor resolution performance by tracking SLAs MTTR and root cause analysis effectiveness while ensuring corrective actions are validated.
Implement standardized communication workflows for operational events and manage stakeholder notifications and self-service subscription options.
Ensure alignment with enterprise IT policies by recommending improvements that enhance reliability security and cost efficiency.
Utilize ServiceNow to create and manage Requests for Change link risk assessments and verify post-change monitoring health.
Produce SLA reporting and operational metrics related to availability incidents and service improvements.
Design test and maintain disaster recovery plans including defining RTO/RPO targets and conducting periodic recovery exercises.
Maintain technical expertise by staying current on emerging monitoring technologies tools and industry best practices.
Support continuity operations during critical incidents including performing assigned duties at alternate operational sites when required.
Adhere to ITIL-aligned service management processes and contribute to process maturity initiatives and compliance audits.
Required Experience:
Senior IC
Location: Harrisburg PAPosition Type: HyrbidHybrid Schedule: At least one day onsite a weekContract Length: Long-term with annual extensionsPosition Overview:This role serves as a subject matter expert responsible for enterprise monitoring observability and automation initiatives that improve operat...
Location: Harrisburg PA
Position Type: Hyrbid
Hybrid Schedule: At least one day onsite a week
Contract Length: Long-term with annual extensionsPosition Overview:This role serves as a subject matter expert responsible for enterprise monitoring observability and automation initiatives that improve operational visibility service reliability and incident response. The position focuses on modernizing monitoring processes through automation standardized workflows and strong IT service management practices across hybrid infrastructure environments.
Required Skills:5years of experience in IT infrastructure monitoring automation and observability within hybrid environments.
Strong proficiency in PowerShell and at least one additional scripting language such as Python SQL or Bash.
Hands-on experience using Azure Monitor Log Analytics Ansible SQL and KQL for monitoring and analytics.
Experience implementing automation solutions using Azure Automation and CI/CD pipelines.
Expertise working with enterprise monitoring platforms such as SCOM SquaredUp or equivalent tools including Dynatrace Datadog or Splunk.
Knowledge of API integrations and secure authentication methods.
Experience utilizing ServiceNow or similar IT Service Management (ITSM) platforms.
Preferred Skills:Duties:Drive process and tooling improvements by identifying operational gaps and implementing automation-first solutions to reduce manual effort and enhance service quality.
Maintain endpoint monitoring connectivity by managing telemetry ingestion through agents SNMP WMI APIs and secure credential and certificate administration.
Develop maintain and organize documentation including runbooks SOPs service maps and workflows within version-controlled repositories.
Document incidents and problems using monitoring and observability data produce post-incident reviews and maintain a Known Error Database.
Collaborate with change incident and problem management teams to ensure standardized processes risk assessments and communication plans are followed.
Monitor resolution performance by tracking SLAs MTTR and root cause analysis effectiveness while ensuring corrective actions are validated.
Implement standardized communication workflows for operational events and manage stakeholder notifications and self-service subscription options.
Ensure alignment with enterprise IT policies by recommending improvements that enhance reliability security and cost efficiency.
Utilize ServiceNow to create and manage Requests for Change link risk assessments and verify post-change monitoring health.
Produce SLA reporting and operational metrics related to availability incidents and service improvements.
Design test and maintain disaster recovery plans including defining RTO/RPO targets and conducting periodic recovery exercises.
Maintain technical expertise by staying current on emerging monitoring technologies tools and industry best practices.
Support continuity operations during critical incidents including performing assigned duties at alternate operational sites when required.
Adhere to ITIL-aligned service management processes and contribute to process maturity initiatives and compliance audits.
Required Experience:
Senior IC
View more
View less