OpenShift Platform Lead
Job Information
Job Title: OpenShift Platform Lead - Virtualization Services
Job Summary
We are seeking an experienced OpenShift Platform Lead to own and manage our OpenShift-based virtualization platform that delivers enterprise VM hosting services. This role is responsible for the complete lifecycle management of the platform including design architecture BAU operations patching upgrades incident response and driving platform stability.
You will lead the implementation work closely with SRE and operations teams and enable seamless VM migration from legacy infrastructure. This is a hands-on technical leadership role requiring deep OpenShift expertise and the ability to balance operational excellence with strategic platform evolution.
Key Responsibilities
Platform Leadership & Strategy
- Own the technical strategy and roadmap for the OpenShift Virtualization platform
- Define platform architecture design patterns and technical standards
- Lead platform lifecycle management including major/minor upgrades and Red Hat CoreOS updates
- Drive platform stability improvements and performance optimization initiatives
- Establish platform governance compliance and security policies
- Build relationships with Red Hat support and leverage Technical Account Management (TAM)
Lifecycle & Operations Management
- Manage complete platform lifecycle from installation through upgrades to decommissioning
- Plan and execute OpenShift platform upgrades (4.x releases) with zero/minimal downtime
- Coordinate quarterly/monthly Red Hat CoreOS (RHCOS) patching cycles
- Oversee OpenShift Virtualization operator upgrades and feature enablement
- Maintain platform health through proactive monitoring and capacity planning
- Ensure platform meets defined SLAs and availability targets (99.9%)
Incident & Event Management
- Lead Major Incident response for platform-level issues (Sev 1/2)
- Perform root cause analysis (RCA) and implement preventive measures
- Collaborate with SRE team on incident postmortems and improvement plans
- Manage platform-related events including maintenance windows
- Coordinate emergency changes and rollback procedures
- Participate in on-call rotation for critical platform escalations
Change Implementation & Release Management
- Review and approve platform changes through Change Advisory Board (CAB)
- Plan and execute complex platform changes with risk assessment
- Implement infrastructure-as-code (IaC) practices using Ansible and Terraform
- Drive GitOps adoption for platform configuration management
- Coordinate release windows for platform updates with business stakeholders
- Ensure change documentation and runbook accuracy
VM Migration & Workload Onboarding
- Lead VM migration strategy from VMware/legacy platforms to OpenShift Virtualization
- Design VM migration runbooks and automation workflows
- Create and maintain VM templates golden images and standardized configurations
- Enable application teams for self-service VM provisioning
- Troubleshoot VM performance networking and storage issues
- Optimize VM placement resource allocation and cluster balancing
Platform Stability & Performance
- Define and monitor key performance indicators (KPIs) for platform health
- Implement chaos engineering practices to validate platform resilience
- Tune OpenShift control plane and worker node performance
- Optimize storage performance (ODF/Ceph) for VM workloads
- Configure network policies and OVN-Kubernetes for optimal VM networking
- Drive continuous improvement initiatives based on operational metrics
Required Qualifications
Must-Have Skills & Experience
Experience Requirements:
- 8-12 years of overall IT infrastructure experience
- 5 years of hands-on experience with Red Hat OpenShift Container Platform (4.x)
- 3 years of experience with OpenShift Virtualization (KubeVirt) or similar VM hosting platforms
- 3 years of experience in platform/infrastructure leadership roles
- 2 years of experience with Red Hat Enterprise Linux (RHEL 7/8/9) and Red Hat CoreOS (RHCOS)
Technical Skills:
- Expert-level OpenShift administration (oc CLI Web Console API)
- Advanced OpenShift Virtualization knowledge (VMs DataVolumes CDI live migration)
- Advanced Red Hat CoreOS and Machine Config Operator (MCO) experience
- Advanced Linux administration and troubleshooting (RHEL-based)
- Advanced storage management (ODF/Ceph Storage Classes PV/PVC CSI drivers)
- Advanced networking (OVN-Kubernetes Multus Network Policies SDN concepts)
- Advanced automation skills (Ansible Bash scripting Python)
- Intermediate Kubernetes concepts (Operators Custom Resources Pod lifecycle)
- Intermediate Infrastructure-as-Code (Terraform GitOps tools like ArgoCD/Flux)
- Intermediate observability platforms (Prometheus Grafana AlertManager)
Platform Operations:
- Proven experience managing platform lifecycle (installation upgrades patching)
- Strong incident management and major incident response experience
- Experience with change management processes and release coordination
- Demonstrated ability to perform root cause analysis and implement preventive measures
- Experience with capacity planning and performance tuning
- Track record of driving platform stability improvements
Certifications Required (one or more):
- Red Hat Certified Engineer (RHCE)
- Red Hat Certified Specialist in OpenShift Administration
- OR equivalent demonstrable experience
Desirable Skills & Experience
Highly Desirable:
- Red Hat Certified Architect (RHCA) certification
- Red Hat Certified Specialist in OpenShift Virtualization
- Experience with Red Hat Advanced Cluster Management (RHACM)
- Experience with Red Hat Advanced Cluster Security (RHACS/Stackrox)
- GitOps expertise (ArgoCD Flux Tekton)
- Chaos engineering experience (Litmus Chaos Mesh)
- Experience with OpenShift on multiple infrastructures (bare metal VMware AWS Azure)
Nice to Have:
- Certified Kubernetes Administrator (CKA) or CKS
- Experience with multi-tenancy and namespace isolation strategies
- Knowledge of compliance frameworks (PCI-DSS HIPAA SOC2 ISO 27001)
- Experience with backup solutions (Kasten K10 Veeam Commvault)
- Programming skills in Go Python or Java
- Experience with hybrid/multi-cloud architectures
- ITIL v4 Foundation certification
Key Success Metrics
- Platform availability: 99.9% uptime
- Successful upgrade completion rate: 100% with zero unplanned rollbacks
- Incident MTTR: < 2 hours for Sev 1/2 incidents
- VM migration velocity: Target VMs per month with <5% issues
- Platform capacity utilization: 70-80% optimal range
- Change success rate: >98% first-time success
Work Environment
- Some evening/weekend work required for maintenance windows
- Available 24 x7 during major issues
Required Skills:
infrastructure
OpenShift Platform Lead Job Information Job Title: OpenShift Platform Lead - Virtualization ServicesJob SummaryWe are seeking an experienced OpenShift Platform Lead to own and manage our OpenShift-based virtualization platform that delivers enterprise VM hosting services. This role is responsible fo...
OpenShift Platform Lead
Job Information
Job Title: OpenShift Platform Lead - Virtualization Services
Job Summary
We are seeking an experienced OpenShift Platform Lead to own and manage our OpenShift-based virtualization platform that delivers enterprise VM hosting services. This role is responsible for the complete lifecycle management of the platform including design architecture BAU operations patching upgrades incident response and driving platform stability.
You will lead the implementation work closely with SRE and operations teams and enable seamless VM migration from legacy infrastructure. This is a hands-on technical leadership role requiring deep OpenShift expertise and the ability to balance operational excellence with strategic platform evolution.
Key Responsibilities
Platform Leadership & Strategy
- Own the technical strategy and roadmap for the OpenShift Virtualization platform
- Define platform architecture design patterns and technical standards
- Lead platform lifecycle management including major/minor upgrades and Red Hat CoreOS updates
- Drive platform stability improvements and performance optimization initiatives
- Establish platform governance compliance and security policies
- Build relationships with Red Hat support and leverage Technical Account Management (TAM)
Lifecycle & Operations Management
- Manage complete platform lifecycle from installation through upgrades to decommissioning
- Plan and execute OpenShift platform upgrades (4.x releases) with zero/minimal downtime
- Coordinate quarterly/monthly Red Hat CoreOS (RHCOS) patching cycles
- Oversee OpenShift Virtualization operator upgrades and feature enablement
- Maintain platform health through proactive monitoring and capacity planning
- Ensure platform meets defined SLAs and availability targets (99.9%)
Incident & Event Management
- Lead Major Incident response for platform-level issues (Sev 1/2)
- Perform root cause analysis (RCA) and implement preventive measures
- Collaborate with SRE team on incident postmortems and improvement plans
- Manage platform-related events including maintenance windows
- Coordinate emergency changes and rollback procedures
- Participate in on-call rotation for critical platform escalations
Change Implementation & Release Management
- Review and approve platform changes through Change Advisory Board (CAB)
- Plan and execute complex platform changes with risk assessment
- Implement infrastructure-as-code (IaC) practices using Ansible and Terraform
- Drive GitOps adoption for platform configuration management
- Coordinate release windows for platform updates with business stakeholders
- Ensure change documentation and runbook accuracy
VM Migration & Workload Onboarding
- Lead VM migration strategy from VMware/legacy platforms to OpenShift Virtualization
- Design VM migration runbooks and automation workflows
- Create and maintain VM templates golden images and standardized configurations
- Enable application teams for self-service VM provisioning
- Troubleshoot VM performance networking and storage issues
- Optimize VM placement resource allocation and cluster balancing
Platform Stability & Performance
- Define and monitor key performance indicators (KPIs) for platform health
- Implement chaos engineering practices to validate platform resilience
- Tune OpenShift control plane and worker node performance
- Optimize storage performance (ODF/Ceph) for VM workloads
- Configure network policies and OVN-Kubernetes for optimal VM networking
- Drive continuous improvement initiatives based on operational metrics
Required Qualifications
Must-Have Skills & Experience
Experience Requirements:
- 8-12 years of overall IT infrastructure experience
- 5 years of hands-on experience with Red Hat OpenShift Container Platform (4.x)
- 3 years of experience with OpenShift Virtualization (KubeVirt) or similar VM hosting platforms
- 3 years of experience in platform/infrastructure leadership roles
- 2 years of experience with Red Hat Enterprise Linux (RHEL 7/8/9) and Red Hat CoreOS (RHCOS)
Technical Skills:
- Expert-level OpenShift administration (oc CLI Web Console API)
- Advanced OpenShift Virtualization knowledge (VMs DataVolumes CDI live migration)
- Advanced Red Hat CoreOS and Machine Config Operator (MCO) experience
- Advanced Linux administration and troubleshooting (RHEL-based)
- Advanced storage management (ODF/Ceph Storage Classes PV/PVC CSI drivers)
- Advanced networking (OVN-Kubernetes Multus Network Policies SDN concepts)
- Advanced automation skills (Ansible Bash scripting Python)
- Intermediate Kubernetes concepts (Operators Custom Resources Pod lifecycle)
- Intermediate Infrastructure-as-Code (Terraform GitOps tools like ArgoCD/Flux)
- Intermediate observability platforms (Prometheus Grafana AlertManager)
Platform Operations:
- Proven experience managing platform lifecycle (installation upgrades patching)
- Strong incident management and major incident response experience
- Experience with change management processes and release coordination
- Demonstrated ability to perform root cause analysis and implement preventive measures
- Experience with capacity planning and performance tuning
- Track record of driving platform stability improvements
Certifications Required (one or more):
- Red Hat Certified Engineer (RHCE)
- Red Hat Certified Specialist in OpenShift Administration
- OR equivalent demonstrable experience
Desirable Skills & Experience
Highly Desirable:
- Red Hat Certified Architect (RHCA) certification
- Red Hat Certified Specialist in OpenShift Virtualization
- Experience with Red Hat Advanced Cluster Management (RHACM)
- Experience with Red Hat Advanced Cluster Security (RHACS/Stackrox)
- GitOps expertise (ArgoCD Flux Tekton)
- Chaos engineering experience (Litmus Chaos Mesh)
- Experience with OpenShift on multiple infrastructures (bare metal VMware AWS Azure)
Nice to Have:
- Certified Kubernetes Administrator (CKA) or CKS
- Experience with multi-tenancy and namespace isolation strategies
- Knowledge of compliance frameworks (PCI-DSS HIPAA SOC2 ISO 27001)
- Experience with backup solutions (Kasten K10 Veeam Commvault)
- Programming skills in Go Python or Java
- Experience with hybrid/multi-cloud architectures
- ITIL v4 Foundation certification
Key Success Metrics
- Platform availability: 99.9% uptime
- Successful upgrade completion rate: 100% with zero unplanned rollbacks
- Incident MTTR: < 2 hours for Sev 1/2 incidents
- VM migration velocity: Target VMs per month with <5% issues
- Platform capacity utilization: 70-80% optimal range
- Change success rate: >98% first-time success
Work Environment
- Some evening/weekend work required for maintenance windows
- Available 24 x7 during major issues
Required Skills:
infrastructure
View more
View less