Kafka Administrator – Automation
Job Summary
Job Title: Kafka Administrator - Automation
Role Summary
We are seeking an experienced Kafka Administrator with a strong automation focus to manage secure and scale high-availability Kafka clusters across on-premise and cloud environments. The ideal candidate will leverage Infrastructure-as-Code (IaC) and automation tools such as Ansible Terraform and GitOps frameworks to minimize manual operations and improve reliability.
This role involves deploying Kafka on Kubernetes/OpenShift using Confluent for Kubernetes (CFK) automating cluster operations ensuring security compliance and maintaining performance monitoring and disaster recovery strategies.
Key Responsibilities
Automation & Infrastructure as Code (IaC)
-
Develop and maintain Ansible playbooks and Python/Shell scripts to automate Kafka installation configuration upgrades and patching.
-
Implement Terraform-based infrastructure provisioning and GitOps workflows using tools like ArgoCD or Jenkins for continuous deployment.
-
Automate Kafka cluster operations including topic and partition management.
Kafka Cluster Management & Deployment
-
Deploy and manage Confluent Platform using Confluent for Kubernetes (CFK) on Kubernetes/OpenShift.
-
Configure and maintain KRaft (Kafka Raft) mode topics partitions and replication factors to ensure high availability and fault tolerance.
-
Perform cluster scaling upgrades and performance optimization.
Monitoring & Performance Optimization
-
Implement monitoring using Prometheus Grafana JMX and Confluent Control Center.
-
Monitor broker health consumer lag throughput and producer latency.
-
Troubleshoot performance bottlenecks and ensure optimal system performance.
Security & Compliance
-
Implement and automate Kafka security mechanisms including:
-
TLS/SSL encryption
-
SASL authentication
-
Kerberos integration
-
RBAC and ACL management
-
-
Manage certificate lifecycle and ensure compliance with security policies.
Kafka Connect & Schema Management
-
Configure and manage Kafka Connect clusters and connectors (e.g. S3 JDBC Snowflake).
-
Manage schemas and enforce data governance using Confluent Schema Registry.
Disaster Recovery & High Availability
-
Design and maintain Active-Passive cluster setups for disaster recovery.
-
Configure Cluster Linking or multi-region replication for high availability and data resilience.