We are seeking an experienced Kafka Administrator with a strong automation focus to manage secure and scale high-availability Kafka clusters across on-premise and cloud environments. The ideal candidate will leverage Infrastructure-as-Code (IaC) and automation tools such as Ansible Terraform and GitOps frameworks to minimize manual operations and improve reliability.
This role involves deploying Kafka on Kubernetes/OpenShift using Confluent for Kubernetes (CFK) automating cluster operations ensuring security compliance and maintaining performance monitoring and disaster recovery strategies.
Develop and maintain Ansible playbooks and Python/Shell scripts to automate Kafka installation configuration upgrades and patching.
Implement Terraform-based infrastructure provisioning and GitOps workflows using tools like ArgoCD or Jenkins for continuous deployment.
Automate Kafka cluster operations including topic and partition management.
Deploy and manage Confluent Platform using Confluent for Kubernetes (CFK) on Kubernetes/OpenShift.
Configure and maintain KRaft (Kafka Raft) mode topics partitions and replication factors to ensure high availability and fault tolerance.
Perform cluster scaling upgrades and performance optimization.
Implement monitoring using Prometheus Grafana JMX and Confluent Control Center.
Monitor broker health consumer lag throughput and producer latency.
Troubleshoot performance bottlenecks and ensure optimal system performance.
Implement and automate Kafka security mechanisms including:
TLS/SSL encryption
SASL authentication
Kerberos integration
RBAC and ACL management
Manage certificate lifecycle and ensure compliance with security policies.
Configure and manage Kafka Connect clusters and connectors (e.g. S3 JDBC Snowflake).
Manage schemas and enforce data governance using Confluent Schema Registry.
Design and maintain Active-Passive cluster setups for disaster recovery.
Configure Cluster Linking or multi-region replication for high availability and data resilience.