Are you ready to power the Worlds connections
If you dont think you meet all of the criteria below but are still interested in the job please apply. Nobody checks every box - were looking for candidates that are particularly strong in a few areas and have some interest and capabilities in others.
As a Site Reliability Engineer youll join the global Platform SRE team responsible for building operating and scaling Kongs multi-region SaaS platform that powers the worlds API connectivity.
Youll design automate and run production systems serving thousands of customers across AWS GCP and Azure. Youll work on everything from multi-region Kubernetes clusters to service mesh and gateway architectures ensuring the reliability scalability and security of Kongs SaaS offerings.
This is a hands-on role ideal for engineers who thrive on running production SaaS systems at scale automating operations and continuously improving performance resilience and deployment pipelines.
Operate and scale Kongs global SaaS platform (Konnect) ensuring reliability availability and performance across regions and clouds.
Build automate and maintain Kubernetes-based infrastructure and deployment workflows using Terraform/Terragrunt Helm and ArgoCD.
Design maintain and optimize multi-region data and caching layers including PostgreSQL Redis ClickHouse and Druid for high availability and low latency.
Operate and improve Kong Gateway and Kong Mesh environments supporting hybrid and distributed architectures.
Develop and maintain CI/CD pipelines and GitOps workflows to automate service delivery and ensure consistent infrastructure changes.
Enhance observability and incident response readiness through systems like Datadog Prometheus Grafana and Thanos defining and tracking SLOs.
Collaborate closely with development and security teams to ensure smooth operation of SaaS services in compliance with reliability security and regulatory standards.
Participate in a global 24/7 on-call rotation and drive continuous improvement of operational playbooks and postmortem practices.
Lead and contribute to scaling initiatives that improve elasticity reliability and cost-efficiency across the SaaS platform.
BS in Computer Science or equivalent practical experience.
Proven experience managing SaaS or PaaS systems at enterprise scale (multi-region multi-tenant secure environments).
Deep expertise in Kubernetes including debugging cluster/networking issues and designing for fault tolerance and scalability.
Strong proficiency with Infrastructure as Code tools like Terraform or Terragrunt.
Experience with CI/CD pipelines and GitOps workflows (ArgoCD Atlantis Helm).
Proficiency in one or more programming languages (Go Python Bash) for automation and tooling.
Solid understanding of Linux/Unix systems networking (DNS TLS/SSL HTTP) load balancers and distributed systems.
Experiencing working with API gateway and service mesh technologies
Familiarity with streaming systems like Kafka and observability platforms (Datadog Prometheus Grafana).
Experience working in a 24/7/365 production support environment.
Hands-on experience with Kong Gateway Kong Mesh or similar service connectivity technologies.
Experience operating ClickHouse Druid or other time-series and analytics databases.
Experience managing PostgreSQL and Redis in multi-region configurations.
Working knowledge of AWS networking (PrivateLink Transit Gateway VPC Peering Firewalls) Azure VNet or GCP NCC.
Strong understanding of disaster recovery resiliency testing and compliance-driven reliability practices.
#LI-KC1
About Kong:
Kong Inc. a leading developer of API and AI connectivity technologies is building the infrastructure that powers the agentic era. trusted by the Fortune 500 and startups alike Kongs unified API and AI platform Kong Konnect enables organizations to secure manage accelerate govern and monetize the flow of intelligence across APIs and AI models. For more information visit .
Required Experience:
Senior IC
Kong is the most widely adopted API gateway and service mesh, powering the world’s APIs for modern architectures. Accelerate development and productivity today!