DescriptionWe are seeking a highly skilled and passionate GKE Platform Engineer to join our growing team. This role is ideal for someone with deep experience in managing Google Kubernetes Engine (GKE) platforms at scale particularly with enterpriselevel workloads on Google Cloud Platform (GCP). As part of a dynamic team you will design develop and optimize Kubernetesbased solutions using tools like GitHub Actions ACM KCC and workload identity to provide highquality platform services to developers. You will drive CI/CD pipelines across multiple lifecycle stages manage GKE environments at scale and enhance the developer experience on the platform.
You should have a strong mindset for developer experience focused on creating reliable scalable and efficient infrastructure to support developer needs. This is a fastpaced environment where collaboration across teams is key to delivering impactful results.
ResponsibilitiesResponsibilities:
- GKE Platform Management at Scale: Manage and optimize largescale GKE environments in a multicloud and hybridcloud context ensuring the platform is highly available scalable and secure.
- CI/CD Pipeline Development: Build and maintain CI/CD pipelines using tools like GitHub Actions to automate deployment workflows across the GKE platform. Ensure smooth integration and delivery of services throughout their lifecycle.
- Enterprise GKE Management: Leverage advanced features of GKE such as ACM (Anthos Config Management) and KCC (Kubernetes Cluster Config) to manage GKE clusters efficiently at the enterprise scale.
- Workload Identity & Security: Implement workload identity and security best practices to ensure secure access and management of GKE workloads.
- Custom Operators & Controllers: Develop custom operators and controllers for GKE automating the deployment and management of custom services to enhance the developer experience on the platform.
- Developer Experience Focus: Maintain a developerfirst mindset to create an intuitive reliable and easytouse platform for developers. Collaborate with development teams to ensure seamless integration with the GKE platform.
- GKE Deployment Pipelines: Provide guidelines and best practices for GKE deployment pipelines leveraging tools like Kustomize and Helm to manage and deploy GKE configurations effectively. Ensure pipelines are optimized for scalability security and repeatability.
- Zero Trust Model: Ensure GKE clusters operate effectively within a Zero Trust security model. Maintain a strong understanding of the principles of Zero Trust security including identity and access management network segmentation and workload authentication.
- Ingress Patterns: Design and manage multicluster and multiregional ingress patterns to ensure seamless traffic management and high availability across geographically distributed Kubernetes clusters.
- Deep Troubleshooting & Support: Provide deep troubleshooting knowledge and support to help developers pinpoint issues across the GKE platform focusing on debugging complex Kubernetes issues application failures and performance bottlenecks. Utilize diagnostic tools and debugging techniques to resolve critical platformrelated issues.
- Observability & Logging Tools: Implement and maintain observability across GKE clusters using monitoring logging and alerting tools like Prometheus Dynatrace and Splunk. Ensure proper logging and metrics are in place to enable developers to effectively monitor and diagnose issues within their applications.
- Platform Automation & Integration: Automate platform management tasks such as scaling upgrading and patching using tools like Terraform Helm and GKE APIs.
- Continuous Improvement & Learning: Stay uptodate with the latest trends and advancements in Kubernetes GKE and Google Cloud services to continuously improve platform capabilities.
QualificationsQualifications:
Experience:
- 8 years of overall experience in cloud platform engineering infrastructure management and enterprisescale operations.
- 5 years of handson experience with Google Cloud Platform (GCP) including designing deploying and managing cloud infrastructure and services.
- 5 years of experience specifically with Google Kubernetes Engine (GKE) managing largescale productiongrade clusters in enterprise environments.
- Experience with deploying scaling and maintaining GKE clusters in production environments.
- Handson experience with CI/CD practices and automation tools like GitHub Actions.
- Proven track record of building and managing GKE platforms in a fastpaced dynamic environment.
- Experience developing custom Kubernetes operators and controllers for managing complex workloads.
- Deep Troubleshooting Knowledge: Strong ability to troubleshoot complex platform issues with expertise in diagnosing problems across the entire GKE stack.
Technical Skills:
Must Have:
- Google Cloud Platform (GCP): Extensive handson experience with GCP particularly Kubernetes Engine (GKE) Cloud Storage Cloud Pub/Sub Cloud Logging and Cloud Monitoring.
- Kubernetes (GKE) at Scale: Expertise in managing largescale GKE clusters including security configurations networking and workload management.
- CI/CD Automation: Strong experience with CI/CD pipeline automation tools particularly GitHub Actions for building testing and deploying applications.
- Kubernetes Operators & Controllers: Ability to develop custom Kubernetes operators and controllers to automate and manage applications on GKE.
- Workload Identity & Security: Solid understanding of Kubernetes workload identity and access management (IAM) best practices including integration with GCP Identity and Google Cloud IAM.
- Anthos & ACM: Handson experience with Anthos Config Management (ACM) and Kubernetes Cluster Config (KCC) to manage and govern GKE clusters and workloads at scale.
- Infrastructure as Code (IaC): Experience with tools like Terraform to manage GKE infrastructure and cloud resources.
- Helm & Kustomize: Experience in using Helm and Kustomize for packaging deploying and managing Kubernetes resources efficiently. Ability to create reusable and scalable Kubernetes deployment templates.
- Observability & Logging Tools: Experience with observability tools such as Prometheus Dynatrace and Splunk to monitor and log GKE performance providing developers with actionable insights for troubleshooting.
Nice to Have:
- Zero Trust Security Model: Strong understanding of implementing and maintaining security in a Zero Trust model for GKE including workload authentication identity management and network security.
- Ingress Patterns: Experience with designing and managing multicluster and multiregional ingress in Kubernetes to ensure fault tolerance traffic management and high availability.
- Familiarity with Open Policy Agent (OPA) for policy enforcement in Kubernetes environments.
Education & Certification:
- Bachelors degree in Computer Science Engineering or a related field.
- Relevant GCP certifications such as Google Cloud Certified Professional Cloud Architect or Google Cloud Certified Professional Cloud Developer.
Soft Skills:
- Collaboration: Strong ability to work with crossfunctional teams to ensure platform solutions meet development and operational needs.
- ProblemSolving: Excellent problemsolving skills with a focus on troubleshooting and performance optimization.
- Communication: Strong written and verbal communication skills able to communicate effectively with both technical and nontechnical teams.
- Initiative & Ownership: Ability to take ownership of platform projects driving them from conception to deployment with minimal supervision.
- Adaptability: Willingness to learn new technologies and adjust to evolving business needs.