Senior Kafka & RabbitMQ SRE

Not Interested
Bookmark
Report This Job

profile Job Location:

Irvine, CA - USA

profile Monthly Salary: Not Disclosed
Posted on: 4 hours ago
Vacancies: 1 Vacancy

Job Summary

Position: Senior Messaging Platform SRE (Kafka & RabbitMQ)

We are seeking a Senior Messaging Platform SRE to own the reliability scalability and operational excellence of enterprise messaging and event-streaming platforms including Confluent Kafka and RabbitMQ running on AWS and the Confluent Platform. This role is focused on platform operations SRE practices and infrastructure engineering-ensuring these platforms meet strict SLAs/SLOs for availability latency durability and security. The engineer will be part of the Operations team and will be a single point of contact for Kafka and Rabbit MQ issues.

Qualifications:

  • 8 years of experience in SRE Platform Engineering or Infrastructure Operations.
  • 3 years operating Confluent Kafka in production at scale.
  • 2 years operating RabbitMQ in high-availability distributed environments.
  • Strong hands-on experience with AWS-based deployments (MSK EC2 EBS ALB/NLB IAM).
  • Deep knowledge of Kafka internals (brokers partitions ISR replication rebalancing).
  • Strong operational understanding of RabbitMQ internals (clustering mirroring/quorum queues flow control).
  • Expertise in Kubernetes (EKS) for platform workloads and supporting microservices.
  • Infrastructure-as-Code experience using Terraform and Helm.
  • Advanced experience with monitoring alerting and logging platforms (Splunk Prometheus Grafana ELK).

Responsibilities:

  • Own the end-to-end reliability of Kafka and RabbitMQ platforms including uptime performance capacity and fault tolerance.
  • Define and track SLOs and operational KPIs for messaging platforms.
  • Lead incident response root cause analysis (RCA) and post-incident reviews for Kafka and RabbitMQ outages.
  • Operate and maintain Confluent Kafka and RabbitMQ clusters.
  • Standardize operational runbooks for cluster lifecycle management broker/node failures rebalancing and disaster recovery.
  • Act as the primary escalation point for Kafka and RabbitMQ production issues.
  • Mentor junior engineers and influence platform-wide SRE best practices.
  • Partner with architecture security and application teams to evolve the messaging platform roadmap.
Skill Category Your Experience in Years
Confluent Kafka Platform Engineering
RabbitMQ Platform Operations
AWS Infrastructure for Messaging Platforms
SRE Observability & Operational Excellence
Kubernetes IaC & Automation

Brandon Consulting Associates Inc. is an EQUAL OPPORTUNITY EMPLOYER and has been in business for 29years.

Position: Senior Messaging Platform SRE (Kafka & RabbitMQ) We are seeking a Senior Messaging Platform SRE to own the reliability scalability and operational excellence of enterprise messaging and event-streaming platforms including Confluent Kafka and RabbitMQ running on AWS and the Confluent Platfo...
View more view more

Key Skills

  • Graduate Engineering
  • Accounts Administration
  • Building Materials
  • Customer Support
  • Horticulture