Senior Observability & Platform Engineer

Not Interested
Bookmark
Report This Job

profile Job Location:

Deerfield Beach, FL - USA

profile Monthly Salary: Not Disclosed
Posted on: 2 hours ago
Vacancies: 1 Vacancy

Job Summary

Responsibilities
  • Platform Ownership
  • Network & Monitoring Tools (must have)
  • Familiar with tools such as SolarWinds (including NetPath). As a platform owner ensure platform stability upgrades patching and day-to-day support.
  • Knows network-centric monitoring capabilities including SNMP polling traps and device visibility. Ensure new sites and devices are properly onboarded
  • Partner with platform and cloud teams to ensure migrated workloads meet monitoring standards. Systems Administration (must have)
  • Provide sysadmin support for Linux and Windows servers including:
  • Agent deployment and upgrades (SolarWinds Datadog Dynatrace)
  • OS level troubleshooting and configuration
  • Monitoring and logging enablement
  • Support hybrid environments spanning on-prem and Azure infrastructure.
  • A developer mindset with experience in Dev workflow GitHub PowerShell etc.
  • Observability & Event Management Support (should have)
  • Has experience with tools such as Datadog and Dynatrace. The person will be responsible for collaborating with platform owners to support integrations data quality and alerting hygiene.
  • Assist with event management workflows ensuring alerts are actionable and routed correctly.
  • Participate in efforts to reduce alert noise and repeat incidents. SIEM & Security Visibility (nice to have)
  • Develop a working understanding of SIEM concepts and platforms such as Azure Sentinel and CRIBL.
  • Support log ingestion troubleshooting and collaboration with security and incident response teams.
  • Ensure infrastructure and network telemetry support security detection requirements. Cloud Monitoring & Azure Integration (should have)
  • Has experience with the Azure cloud platform. Have either directly supported or is familiar with Azure-based monitoring and logging including:
  • Azure Monitor and Log Analytics integrations
  • Observability for Azure-hosted workloads Automation AI & Continuous Improvement (nice to have)
  • Explore and apply AI-assisted features within monitoring event management and SIEM tools to:
  • Improve signal quality / reduce alert fatigue
  • Support faster incident triage
  • Contribute to documentation runbooks and operational improvements focused on small incremental wins.
  • Knowledge Transfer & Operational Resilience
  • Participate in knowledge transfer activities related to platform transitions and retirements. Maintain documentation.
  • Support on call or escalation rotations as needed.
Must have
  • Minimum 4-5 years of experience in infrastructure operations monitoring observability or platform operations roles supporting enterprise environments
  • Hands-on experience with systems administration for Linux and Windows servers including troubleshooting configuration and deployment of monitoring or management agents (e.g. SolarWinds Datadog Dynatrace).
  • Foundational networking knowledge including concepts such as SNMP network monitoring LAN/WAN fundamentals firewalls and telemetry collection sufficient to support network-centric monitoring platforms like SolarWinds
  • Not a must but nice to have experience with a platform like StruxureWare.
  • Experience with observability or monitoring platforms such as SolarWinds Datadog Dynatrace or similar tools with an understanding of alerting dashboards and signal quality.
  • Exposure to cloud environments preferably Microsoft Azure including familiarity with monitoring and logging concepts (e.g. cloud-based telemetry logs metrics and integrations).
  • Basic understanding of incident and event management practices including alert triage escalation and collaboration with incident response or operations teams.
  • Demonstrated willingness and ability to learn new technologies quickly with examples of picking up new platforms tools or domains outside of prior core expertise.
  • Familiarity with Agile or SAFe ways of working including collaboration in sprint-based delivery models and cross-functional team engagement is a plus.
  • Strong communication and collaboration skills with the ability to work effectively with platform owners operations teams security teams and external stakeholders.
  • Experience working in a modern Dev workflow using GitHub (branches pull requests code reviews and CI/CD) to manage and deploy scripts/automation used for platform operations
  • Working proficiency in scripting languages such as PowerShell Python BASH or similar scripting languages.
  • Knowledge of Azure Azure Active Directory (AD) and hybrid cloud environments is a plus.
  • Exposure to SIEM concepts or platforms such as Azure Sentinel CRIBL or similar is a plus.
  • Experience with change management practices in an enterprise IT environment is beneficial
Responsibilities Platform Ownership Network & Monitoring Tools (must have) Familiar with tools such as SolarWinds (including NetPath). As a platform owner ensure platform stability upgrades patching and day-to-day support. Knows network-centric monitoring capabilities including SNMP polling traps a...
View more view more

Key Skills

  • APIs
  • C/C++
  • Computer Graphics
  • Go
  • React
  • Redux
  • Node.js
  • AWS
  • Library Services
  • Assembly
  • GraphQL
  • High Voltage