Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailNot Disclosed
Salary Not Disclosed
1 Vacancy
This position will primarily focus on providing design and implementation expertise on infrastructure provisioning management and lifecycle implementation of cloud components and services containers and other critical concepts of DevSecOps principles.
Key Responsibilities:
Logging & Tracing: Implement Loki Promtail and OpenTelemetry to collect process and analyze logs and traces for debugging and forensic analysis.
Kubernetes Operations: Deploy maintain and optimize Kubernetes clusters ensuring observability tools are properly integrated and configured.
Incident Response & SLOs: Define SLIs SLOs and error budgets develop alerting strategies using Alertmanager and automate incident response processes.
High Availability & Scalability: Optimize observability stack for high availability in limited connectivity environments leveraging solutions like Thanos for long-term storage and Minio for object storage.
Security & Compliance: Implement observability best practices in compliance with security frameworks and Kubernetes security tools such as NeuVector.
Automation & Infrastructure as Code (IaC): Automate observability deployments using Terraform Helm and Kubernetes Operators.
Collaboration & Documentation: Work closely with DevOps security and platform teams to enhance system reliability and maintain comprehensive documentation.
Qualifications :
Active Secret or Top Secret Clearance.
Strong Kubernetes expertise in managing and monitoring clusters at scale.
Experience with observability stacks including Prometheus Loki Thanos Grafana OpenTelemetry and Mimir.
Proficiency in logging and tracing frameworks including Promtail Fluent Bit and OpenTelemetry.
Hands-on experience with incident management and alerting using Alertmanager Grafana Alerts and PagerDuty/Slack integrations.
Deep understanding of Kubernetes networking service meshes (Istio/Linkerd) and security monitoring.
Scripting & Automation: Proficiency in Python Go or Bash for automating observability tasks.
Infrastructure as Code (IaC): Experience with Terraform Helm and Kubernetes Operators.
Strong troubleshooting and root cause analysis skills in large-scale distributed systems.
Experience working in air-gapped or limited connectivity environments is a plus.
Preferred Skills:
Additional Information :
We Value:
What we offer:
Oteemo is an equal employment and affirmative action employer. We evaluate qualified applicants on merit and business needs and not on race color religion creed gender sexual orientation national origin ancestry age disability genetic information marital status veteran status or any other factor protected by law. Oteemo complies with the law regarding reasonable accommodations for handicapped and disabled employees.
Remote Work :
No
Employment Type :
Full-time
Full-time