About the Role:
We are seeking a skilled and motivated DevOps / Site Reliability Engineer (SRE) with 2 years of experience to help us build scale and maintain robust secure and high-availability infrastructure. As a DevOps/SRE team member you will work closely with development QA and operations teams to automate processes monitor system health and ensure the reliability of our services.
This is a hands-on role that requires strong technical skills a deep understanding of modern DevOps tools and practices and a problem-solving mindset.
Key Responsibilities:
- Design implement and maintain CI/CD pipelines for reliable code deployment
- Monitor application performance and system reliability using tools like Prometheus Grafana or Datadog
- Maintain and improve cloud infrastructure (e.g. AWS GCP Azure) following best practices
- Manage infrastructure as code using tools such as Terraform Ansible or CloudFormation
- Troubleshoot infrastructure and application issues ensuring minimal downtime and fast resolution
- Automate repetitive operational tasks and improve development workflows
- Implement and enforce security backup and disaster recovery strategies
- Participate in on-call rotation and respond to incidents with root cause analysis and postmortem reviews
- Work closely with development teams to ensure applications are designed for performance availability and scalability
- Optimize resource usage and costs across cloud environments
Qualifications:
Required:
- Bachelors degree in Computer Science Engineering or a related field
- 2 years of experience in a DevOps SRE or Systems Engineering role
- Hands-on experience with Linux/Unix system administration
- Experience with CI/CD tools such as Jenkins GitHub Actions CircleCI or GitLab CI
- Working knowledge of cloud platforms (AWS GCP Azure)
- Familiarity with containerization and orchestration tools (e.g. Docker Kubernetes)
- Experience with infrastructure as code tools like Terraform Ansible or similar
- Proficient in at least one scripting or programming language (e.g. Bash Python Go)
- Strong understanding of monitoring logging and alerting systems
- Version control with Git
Preferred:
- Experience with Kubernetes administration in production environments
- Familiarity with security best practices and compliance standards
- Understanding of networking load balancing and DNS configurations
- Exposure to incident management and SLA/SLO/SLI concepts
- Experience working in Agile environments