Responsibilities
- Platform Ownership
- Network & Monitoring Tools (must have)
- Familiar with tools such as SolarWinds (including NetPath). As a platform owner ensure platform stability upgrades patching and day-to-day support.
- Knows network-centric monitoring capabilities including SNMP polling traps and device visibility. Ensure new sites and devices are properly onboarded
- Partner with platform and cloud teams to ensure migrated workloads meet monitoring standards. Systems Administration (must have)
- Provide sysadmin support for Linux and Windows servers including:
- Agent deployment and upgrades (SolarWinds Datadog Dynatrace)
- OS level troubleshooting and configuration
- Monitoring and logging enablement
- Support hybrid environments spanning on-prem and Azure infrastructure.
- A developer mindset with experience in Dev workflow GitHub PowerShell etc.
- Observability & Event Management Support (should have)
- Has experience with tools such as Datadog and Dynatrace. The person will be responsible for collaborating with platform owners to support integrations data quality and alerting hygiene.
- Assist with event management workflows ensuring alerts are actionable and routed correctly.
- Participate in efforts to reduce alert noise and repeat incidents. SIEM & Security Visibility (nice to have)
- Develop a working understanding of SIEM concepts and platforms such as Azure Sentinel and CRIBL.
- Support log ingestion troubleshooting and collaboration with security and incident response teams.
- Ensure infrastructure and network telemetry support security detection requirements. Cloud Monitoring & Azure Integration (should have)
- Has experience with the Azure cloud platform. Have either directly supported or is familiar with Azure-based monitoring and logging including:
- Azure Monitor and Log Analytics integrations
- Observability for Azure-hosted workloads Automation AI & Continuous Improvement (nice to have)
- Explore and apply AI-assisted features within monitoring event management and SIEM tools to:
- Improve signal quality / reduce alert fatigue
- Support faster incident triage
- Contribute to documentation runbooks and operational improvements focused on small incremental wins.
- Knowledge Transfer & Operational Resilience
- Participate in knowledge transfer activities related to platform transitions and retirements. Maintain documentation.
- Support on call or escalation rotations as needed.
Must have
- Minimum 4-5 years of experience in infrastructure operations monitoring observability or platform operations roles supporting enterprise environments
- Hands-on experience with systems administration for Linux and Windows servers including troubleshooting configuration and deployment of monitoring or management agents (e.g. SolarWinds Datadog Dynatrace).
- Foundational networking knowledge including concepts such as SNMP network monitoring LAN/WAN fundamentals firewalls and telemetry collection sufficient to support network-centric monitoring platforms like SolarWinds
- Not a must but nice to have experience with a platform like StruxureWare.
- Experience with observability or monitoring platforms such as SolarWinds Datadog Dynatrace or similar tools with an understanding of alerting dashboards and signal quality.
- Exposure to cloud environments preferably Microsoft Azure including familiarity with monitoring and logging concepts (e.g. cloud-based telemetry logs metrics and integrations).
- Basic understanding of incident and event management practices including alert triage escalation and collaboration with incident response or operations teams.
- Demonstrated willingness and ability to learn new technologies quickly with examples of picking up new platforms tools or domains outside of prior core expertise.
- Familiarity with Agile or SAFe ways of working including collaboration in sprint-based delivery models and cross-functional team engagement is a plus.
- Strong communication and collaboration skills with the ability to work effectively with platform owners operations teams security teams and external stakeholders.
- Experience working in a modern Dev workflow using GitHub (branches pull requests code reviews and CI/CD) to manage and deploy scripts/automation used for platform operations
- Working proficiency in scripting languages such as PowerShell Python BASH or similar scripting languages.
- Knowledge of Azure Azure Active Directory (AD) and hybrid cloud environments is a plus.
- Exposure to SIEM concepts or platforms such as Azure Sentinel CRIBL or similar is a plus.
- Experience with change management practices in an enterprise IT environment is beneficial
Responsibilities Platform Ownership Network & Monitoring Tools (must have) Familiar with tools such as SolarWinds (including NetPath). As a platform owner ensure platform stability upgrades patching and day-to-day support. Knows network-centric monitoring capabilities including SNMP polling traps a...
Responsibilities
- Platform Ownership
- Network & Monitoring Tools (must have)
- Familiar with tools such as SolarWinds (including NetPath). As a platform owner ensure platform stability upgrades patching and day-to-day support.
- Knows network-centric monitoring capabilities including SNMP polling traps and device visibility. Ensure new sites and devices are properly onboarded
- Partner with platform and cloud teams to ensure migrated workloads meet monitoring standards. Systems Administration (must have)
- Provide sysadmin support for Linux and Windows servers including:
- Agent deployment and upgrades (SolarWinds Datadog Dynatrace)
- OS level troubleshooting and configuration
- Monitoring and logging enablement
- Support hybrid environments spanning on-prem and Azure infrastructure.
- A developer mindset with experience in Dev workflow GitHub PowerShell etc.
- Observability & Event Management Support (should have)
- Has experience with tools such as Datadog and Dynatrace. The person will be responsible for collaborating with platform owners to support integrations data quality and alerting hygiene.
- Assist with event management workflows ensuring alerts are actionable and routed correctly.
- Participate in efforts to reduce alert noise and repeat incidents. SIEM & Security Visibility (nice to have)
- Develop a working understanding of SIEM concepts and platforms such as Azure Sentinel and CRIBL.
- Support log ingestion troubleshooting and collaboration with security and incident response teams.
- Ensure infrastructure and network telemetry support security detection requirements. Cloud Monitoring & Azure Integration (should have)
- Has experience with the Azure cloud platform. Have either directly supported or is familiar with Azure-based monitoring and logging including:
- Azure Monitor and Log Analytics integrations
- Observability for Azure-hosted workloads Automation AI & Continuous Improvement (nice to have)
- Explore and apply AI-assisted features within monitoring event management and SIEM tools to:
- Improve signal quality / reduce alert fatigue
- Support faster incident triage
- Contribute to documentation runbooks and operational improvements focused on small incremental wins.
- Knowledge Transfer & Operational Resilience
- Participate in knowledge transfer activities related to platform transitions and retirements. Maintain documentation.
- Support on call or escalation rotations as needed.
Must have
- Minimum 4-5 years of experience in infrastructure operations monitoring observability or platform operations roles supporting enterprise environments
- Hands-on experience with systems administration for Linux and Windows servers including troubleshooting configuration and deployment of monitoring or management agents (e.g. SolarWinds Datadog Dynatrace).
- Foundational networking knowledge including concepts such as SNMP network monitoring LAN/WAN fundamentals firewalls and telemetry collection sufficient to support network-centric monitoring platforms like SolarWinds
- Not a must but nice to have experience with a platform like StruxureWare.
- Experience with observability or monitoring platforms such as SolarWinds Datadog Dynatrace or similar tools with an understanding of alerting dashboards and signal quality.
- Exposure to cloud environments preferably Microsoft Azure including familiarity with monitoring and logging concepts (e.g. cloud-based telemetry logs metrics and integrations).
- Basic understanding of incident and event management practices including alert triage escalation and collaboration with incident response or operations teams.
- Demonstrated willingness and ability to learn new technologies quickly with examples of picking up new platforms tools or domains outside of prior core expertise.
- Familiarity with Agile or SAFe ways of working including collaboration in sprint-based delivery models and cross-functional team engagement is a plus.
- Strong communication and collaboration skills with the ability to work effectively with platform owners operations teams security teams and external stakeholders.
- Experience working in a modern Dev workflow using GitHub (branches pull requests code reviews and CI/CD) to manage and deploy scripts/automation used for platform operations
- Working proficiency in scripting languages such as PowerShell Python BASH or similar scripting languages.
- Knowledge of Azure Azure Active Directory (AD) and hybrid cloud environments is a plus.
- Exposure to SIEM concepts or platforms such as Azure Sentinel CRIBL or similar is a plus.
- Experience with change management practices in an enterprise IT environment is beneficial
View more
View less