We are seeking a highly skilled Senior DevOps Engineer to manage our infrastructure implement GitOps practices and administer our Kubernetes clusters. The ideal candidate will have a strong development background with excellent debugging skills. You will be responsible for designing implementing and maintaining our infrastructure ensuring high availability scalability and security. Your expertise in Kubernetes networking and monitoring will be crucial in optimizing our systems and supporting our development teams.
Infrastructure Management:
- Design implement and manage complex architectures.
- Optimize resource utilization and implement costsaving measures.
Kubernetes Administration:
- Design deploy and manage Kubernetes clusters.
- Implement and maintain Kubernetes best practices for security scalability and performance.
- Troubleshoot complex issues within Kubernetes clusters and applications.
GitOps Implementation:
- Implement and maintain GitOps workflows for infrastructure and application deployments.
- Integrate GitOps practices with CI/CD pipelines and Kubernetes.
- Ensure version control and auditability of infrastructure and application configurations.
Networking:
- Design and implement complex networking solutions in Isolated OnPremise subnets and route tables.
- Set up and manage VPN connections for secure access.
Monitoring and Observability:
- Design and implement a comprehensive monitoring stack using tools such as Prometheus Grafana and ELK stack.
- Set up alerting and incident response systems.
- Implement logging solutions and log analysis tools.
Security and Compliance:
- Implement Kubernetes security best practices.
- Conduct regular security audits and implement necessary improvements.
- Ensure compliance with industry standards and regulations.
CI/CD Pipeline:
- Design and implement CI/CD pipelines tools like Jenkins or GitLab CI or GitHub Actions.
- Integrate CI/CD pipelines with Kubernetes and GitOps workflows.
Debugging and Troubleshooting:
- Apply strong debugging skills to resolve complex issues across the entire stack.
- Perform root cause analysis and implement longterm solutions.
Documentation and Knowledge Sharing:
- Maintain comprehensive documentation for infrastructure processes and best practices.
- Mentor junior team members and share knowledge across the organization.
You will need to have:
1. Bachelors degree of four or more years of work experience.
2. Experience in DevOps or Site Reliability Engineering roles with a strong focus on Kubernetes.
3. Proven development background with excellent debugging skills.
4. Experience with Kubernetes administration including cluster management security and troubleshooting.
5. Proficiency in GitOps practices and tools (e.g. Flux ArgoCD).
6. Strong skills in implementing and managing monitoring solutions (e.g. Prometheus Grafana ELK stack).
7. Strong scripting skills in languages such as Python Go or Bash.
Even better if you have one or more of the following:
1. Strong communication skills to collaborate with crossfunctional teams and explain technical concepts.
2. Experience with MLOps (MLFlow or Kubeflow)
Knowledge of security best practices in OnPremise and Kubernetes environments.
3. Good Knowledge on Rancher will be an added advantage.
gitops,monitoring,scripting,infrastructure,grafana,kubernetes,debugging,security,devops,prometheus,networking,azure,ci/cd