We are seeking a hands-on Infrastructure Engineer responsible for building operating and scaling Kubernetes clusters on physical (bare-metal) servers. This role focuses on data center infrastructure hardware provisioning storage automation and cluster reliability. The ideal candidate has strong experience working in on-prem environments and is not primarily cloud-focused.
Deploy configure and maintain Kubernetes clusters on bare-metal infrastructure
Perform rack & stack hardware provisioning cabling and server lifecycle management
Implement Infrastructure as Code (IaC) using Terraform and Ansible
Design and manage persistent storage solutions (SAN NAS Ceph or similar)
Monitor cluster health performance and availability using observability tools
Implement high availability backup and disaster recovery strategies
Manage networking configurations including VLANs load balancing and DNS
Troubleshoot hardware OS networking and cluster-level issues
Collaborate with platform DevOps and application teams to ensure reliability and scalability
Maintain documentation for infrastructure processes and runbooks
Strong experience with bare-metal server provisioning & data center operations
Hands-on experience with rack & stack and physical server management
Linux system administration (RHEL Ubuntu or similar)
Deep expertise in Kubernetes cluster deployment and operations
Experience managing multi-node clusters in production environments
Terraform & Ansible for provisioning and configuration management
Infrastructure as Code (IaC) best practices
Experience with persistent storage solutions (Ceph GlusterFS SAN/NAS)
Networking fundamentals: VLANs routing DNS load balancing
Monitoring & logging tools such as Prometheus Grafana ELK stack
Experience ensuring high availability and system reliability
Experience with container runtimes and networking (CRI-O Containerd CNI plugins)
Knowledge of disaster recovery and backup strategies
Exposure to security hardening and compliance practices
Scripting skills (Bash Python)