TekWissen is a global workforce management provider headquartered in Ann Arbor Michigan that offers strategic talent solutions to our clients world-wide. Our client provider of digital technology and transformation information technology and services
Position: Senior Site Reliability Engineer (SRE) / DevOps Engineer
Job Type: Temporary Assignment
Job Description
- We are seeking a highly experienced SRE / DevOps Engineer to support and scale a Kubernetes-based API Gateway platform built on a Java technology stack.
- The role focuses on reliability observability automation and performance while also contributing to POCs around next-generation AI Gateway capabilities.
Key Responsibilities
Platform Reliability & Operations
- Own reliability availability scalability and performance of API Gateway services running on Kubernetes
- Design and implement SRE best practices including SLIs SLOs SLAs error budgets and incident management
- Lead production readiness reviews root cause analysis (RCA) and post-incident improvements
- Drive capacity planning performance tuning and resilience testing
Kubernetes & Cloud Engineering
- Manage and optimize Kubernetes clusters (EKS / AKS / GKE / On-prem)
- Develop and maintain Helm charts manifests and deployment strategies
- Implement rollout strategies such as blue-green canary and rolling deployments
- Collaborate with development teams to ensure cloud-native design patterns
Observability & Monitoring (Strong Focus)
- Build and maintain enterprise-grade observability (O11y) solutions:
- Prometheus & Grafana for metrics and dashboards
- Splunk for centralized logging and alerting
- OpenTelemetry for distributed tracing
- Define actionable alerts and dashboards for platform and application health
- Improve MTTR through better visibility and automation
CI/CD & Automation
- Design and maintain CI/CD pipelines (Jenkins GitHub Actions GitLab CI etc.)
- Automate infrastructure using Infrastructure as Code (Terraform CloudFormation etc.)
- Develop automation scripts using Python Bash or Groovy
Security & Compliance
- Implement DevSecOps practices including secrets management image scanning and RBAC
- Work closely with security teams on vulnerability remediation and compliance controls
Innovation & POCs
- Actively contribute to POCs for AI Gateway / Intelligent API Gateway initiatives
- Evaluate and prototype integrations with AI/ML-driven routing observability and security features
- Stay current with emerging SRE cloud and AI gateway technologies
Required Skills & Qualifications
Must Have
- 7 8 years of experience in SRE / DevOps / Platform Engineering
- Strong hands-on experience with Kubernetes in production environments
- Solid understanding of Java-based applications and JVM performance considerations
- Deep expertise in Splunk Prometheus Grafana and observability practices
- Experience operating API Gateway platforms (Kong Apigee NGINX Istio etc.)
- Strong Linux fundamentals and networking knowledge (TCP/IP DNS HTTP TLS)
- Experience with cloud platforms (AWS / Azure / GCP)
Nice to Have
- Experience with OpenTelemetry and distributed tracing
- Exposure to AI Gateway / Intelligent Traffic Management concepts
- Experience with service mesh (Istio / Linkerd)
- Certification in Kubernetes (CKA / CKAD) or Cloud platforms
Soft Skills
- Strong troubleshooting and problem-solving skills
- Ability to work cross-functionally with developers architects and security teams
- Proactive mindset with a passion for automation and reliability
- Good documentation and communication skills
TekWissen Group is an equal opportunity employer supporting workforce diversity.
Overview: TekWissen is a global workforce management provider headquartered in Ann Arbor Michigan that offers strategic talent solutions to our clients world-wide. Our client provider of digital technology and transformation information technology and services Position: Senior Si...
TekWissen is a global workforce management provider headquartered in Ann Arbor Michigan that offers strategic talent solutions to our clients world-wide. Our client provider of digital technology and transformation information technology and services
Position: Senior Site Reliability Engineer (SRE) / DevOps Engineer
Job Type: Temporary Assignment
Job Description
- We are seeking a highly experienced SRE / DevOps Engineer to support and scale a Kubernetes-based API Gateway platform built on a Java technology stack.
- The role focuses on reliability observability automation and performance while also contributing to POCs around next-generation AI Gateway capabilities.
Key Responsibilities
Platform Reliability & Operations
- Own reliability availability scalability and performance of API Gateway services running on Kubernetes
- Design and implement SRE best practices including SLIs SLOs SLAs error budgets and incident management
- Lead production readiness reviews root cause analysis (RCA) and post-incident improvements
- Drive capacity planning performance tuning and resilience testing
Kubernetes & Cloud Engineering
- Manage and optimize Kubernetes clusters (EKS / AKS / GKE / On-prem)
- Develop and maintain Helm charts manifests and deployment strategies
- Implement rollout strategies such as blue-green canary and rolling deployments
- Collaborate with development teams to ensure cloud-native design patterns
Observability & Monitoring (Strong Focus)
- Build and maintain enterprise-grade observability (O11y) solutions:
- Prometheus & Grafana for metrics and dashboards
- Splunk for centralized logging and alerting
- OpenTelemetry for distributed tracing
- Define actionable alerts and dashboards for platform and application health
- Improve MTTR through better visibility and automation
CI/CD & Automation
- Design and maintain CI/CD pipelines (Jenkins GitHub Actions GitLab CI etc.)
- Automate infrastructure using Infrastructure as Code (Terraform CloudFormation etc.)
- Develop automation scripts using Python Bash or Groovy
Security & Compliance
- Implement DevSecOps practices including secrets management image scanning and RBAC
- Work closely with security teams on vulnerability remediation and compliance controls
Innovation & POCs
- Actively contribute to POCs for AI Gateway / Intelligent API Gateway initiatives
- Evaluate and prototype integrations with AI/ML-driven routing observability and security features
- Stay current with emerging SRE cloud and AI gateway technologies
Required Skills & Qualifications
Must Have
- 7 8 years of experience in SRE / DevOps / Platform Engineering
- Strong hands-on experience with Kubernetes in production environments
- Solid understanding of Java-based applications and JVM performance considerations
- Deep expertise in Splunk Prometheus Grafana and observability practices
- Experience operating API Gateway platforms (Kong Apigee NGINX Istio etc.)
- Strong Linux fundamentals and networking knowledge (TCP/IP DNS HTTP TLS)
- Experience with cloud platforms (AWS / Azure / GCP)
Nice to Have
- Experience with OpenTelemetry and distributed tracing
- Exposure to AI Gateway / Intelligent Traffic Management concepts
- Experience with service mesh (Istio / Linkerd)
- Certification in Kubernetes (CKA / CKAD) or Cloud platforms
Soft Skills
- Strong troubleshooting and problem-solving skills
- Ability to work cross-functionally with developers architects and security teams
- Proactive mindset with a passion for automation and reliability
- Good documentation and communication skills
TekWissen Group is an equal opportunity employer supporting workforce diversity.
View more
View less