Site Reliability Engineer (SRE) Vulnerability Management Observability & Server Patching
Seattle WA
Role Overview
This role is responsible for ensuring the security reliability and operational excellence of server infrastructure through proactive vulnerability management effective server patching and robust observability practices. The SRE will leverage platforms such as Brinqa for vulnerability aggregation and prioritization and Datadog for monitoring alerting and service observability.
The ideal candidate will work closely with engineering security and application teams to identify and remediate risks execute patching strategies and continuously improve system visibility reliability and compliance.
Key Responsibilities
Vulnerability Management
- Manage and continuously improve the enterprise vulnerability management program using Brinqa for aggregation prioritization and reporting.
- Identify analyze and assess vulnerabilities across server infrastructure including operating systems applications and supporting components.
- Partner with security infrastructure and application teams to prioritize remediation efforts based on risk and business impact.
- Ensure adherence to corporate security policies regulatory requirements and industry best practices.
Server Patching & Remediation
- Plan schedule and execute server patching activities for operating systems and third-party software.
- Track patch compliance and remediation metrics including mean time to patch (MTTP).
- Develop and maintain automation scripts and tooling to streamline patching workflows and improve efficiency.
- Reduce operational risk by standardizing patching processes and minimizing service disruption.
Observability & Reliability
- Maintain and enhance observability of supported services using Datadog.
- Design and implement effective monitoring alerting and dashboards to improve service reliability and operational awareness.
- Define and measure service-level indicators (SLIs) service-level objectives (SLOs) and success metrics.
- Analyze incidents and trends to drive continuous improvement in system reliability and performance.
Collaboration & Operations
- Collaborate with application owners platform teams and other stakeholders to support core SRE and operational objectives.
- Provide guidance and best practices related to reliability security and operational resilience.
- Support incident response root cause analysis and post-incident reviews where applicable.
Skills & Qualifications
- Strong hands-on experience with server operating systems (Windows Server Linux) and patching methodologies.
- Solid understanding of vulnerability management frameworks risk-based prioritization and remediation practices.
- Experience with vulnerability management tools such as Brinqa Qualys or similar platforms.
- Proven experience implementing observability solutions using Datadog.
- Experience working in on-premise and Microsoft Azure environments.
- Hands-on experience with containerized applications using Docker and Kubernetes (K8s).
- Experience with CI/CD pipelines including GitOps-based deployments using ArgoCD.
- Proficiency in automation and scripting (e.g. Python PowerShell Bash).
- Experience supporting on-call rotations incident response and production issue resolution.
- Good knowledge of networking concepts including TCP/IP DNS load balancing firewall rules and troubleshooting connectivity issues.
- Familiarity with ITIL concepts and operational best practices.
- Strong communication and cross-team collaboration skills.
- Ability to work independently manage multiple priorities and operate effectively in a fast-paced environment.
Site Reliability Engineer (SRE) Vulnerability Management Observability & Server Patching Seattle WA Role Overview This role is responsible for ensuring the security reliability and operational excellence of server infrastructure through proactive vulnerability management effective server patchin...
Site Reliability Engineer (SRE) Vulnerability Management Observability & Server Patching
Seattle WA
Role Overview
This role is responsible for ensuring the security reliability and operational excellence of server infrastructure through proactive vulnerability management effective server patching and robust observability practices. The SRE will leverage platforms such as Brinqa for vulnerability aggregation and prioritization and Datadog for monitoring alerting and service observability.
The ideal candidate will work closely with engineering security and application teams to identify and remediate risks execute patching strategies and continuously improve system visibility reliability and compliance.
Key Responsibilities
Vulnerability Management
- Manage and continuously improve the enterprise vulnerability management program using Brinqa for aggregation prioritization and reporting.
- Identify analyze and assess vulnerabilities across server infrastructure including operating systems applications and supporting components.
- Partner with security infrastructure and application teams to prioritize remediation efforts based on risk and business impact.
- Ensure adherence to corporate security policies regulatory requirements and industry best practices.
Server Patching & Remediation
- Plan schedule and execute server patching activities for operating systems and third-party software.
- Track patch compliance and remediation metrics including mean time to patch (MTTP).
- Develop and maintain automation scripts and tooling to streamline patching workflows and improve efficiency.
- Reduce operational risk by standardizing patching processes and minimizing service disruption.
Observability & Reliability
- Maintain and enhance observability of supported services using Datadog.
- Design and implement effective monitoring alerting and dashboards to improve service reliability and operational awareness.
- Define and measure service-level indicators (SLIs) service-level objectives (SLOs) and success metrics.
- Analyze incidents and trends to drive continuous improvement in system reliability and performance.
Collaboration & Operations
- Collaborate with application owners platform teams and other stakeholders to support core SRE and operational objectives.
- Provide guidance and best practices related to reliability security and operational resilience.
- Support incident response root cause analysis and post-incident reviews where applicable.
Skills & Qualifications
- Strong hands-on experience with server operating systems (Windows Server Linux) and patching methodologies.
- Solid understanding of vulnerability management frameworks risk-based prioritization and remediation practices.
- Experience with vulnerability management tools such as Brinqa Qualys or similar platforms.
- Proven experience implementing observability solutions using Datadog.
- Experience working in on-premise and Microsoft Azure environments.
- Hands-on experience with containerized applications using Docker and Kubernetes (K8s).
- Experience with CI/CD pipelines including GitOps-based deployments using ArgoCD.
- Proficiency in automation and scripting (e.g. Python PowerShell Bash).
- Experience supporting on-call rotations incident response and production issue resolution.
- Good knowledge of networking concepts including TCP/IP DNS load balancing firewall rules and troubleshooting connectivity issues.
- Familiarity with ITIL concepts and operational best practices.
- Strong communication and cross-team collaboration skills.
- Ability to work independently manage multiple priorities and operate effectively in a fast-paced environment.
View more
View less