Job Description:
We are seeking a skilled and dedicated Site Reliability Engineer (SRE) to join our team. The SRE will be responsible for ensuring the reliability performance and scalability of our systems and applications. This role combines software development and systems engineering to build and run largescale distributed faulttolerant systems.
RESPONSIBILITIES:
- Infrastructure Management: Design build and maintain the infrastructure required to support a highvolume highavailability environment.
- Monitoring and Incident Response: Develop and implement monitoring strategies to detect and resolve system issues before they impact users. Participate in oncall rotation to manage and mitigate incidents.
- Automation: Automate repetitive tasks to improve efficiency and reliability of the system. Implement CI/CD pipelines to ensure smooth deployments.
- Performance Tuning: Analyze and optimize system performance including troubleshooting latency issues and enhancing system throughput.
- Capacity Planning: Forecast system capacity and plan for future scaling needs. Ensure systems are resilient to handle increased loads.
- Collaboration: Work closely with software engineers QA product managers and other stakeholders to ensure the delivery of reliable and performant services.
- Documentation: Create and maintain detailed documentation of system architecture processes and procedures.
QUALIFICATIONS:
- Bachelor s degree in Computer Science Engineering or a related field (or equivalent practical experience).
- Minimum 3 years of experience in a Site Reliability Engineer DevOps or similar role.
- Experience with cloud platforms (AWS GCP Azure) and container orchestration (Kubernetes Docker).
- Proficient in scripting and automation using languages like Python Bash or Ruby.
- Strong understanding of networking security and system administration.
Requirements
SKILLS:
- Familiarity with configuration management tools (Ansible Chef Puppet).
- Experience with monitoring tools (Prometheus Grafana Nagios).
- Strong analytical and problemsolving skills.
- Excellent communication and collaboration skills.
- Experience with database management (SQL NoSQL).
- Knowledge of Infrastructure as Code (IaC) using tools like Terraform or Pulumi.
- Familiarity with Agile/Scrum methodologies.
- Certification in relevant technologies (e.g. AWS Certified DevOps Engineer) is a plus.
Benefits
Be part of the exciting Growth Story of Thoucentric!
Work on projects that help you stay ahead of the curve. Not just exciting projects if you are a selfstarter you will also get multiple opportunities to design drive and contribute in the organizational and practice initiatives.
Constant learning curve with very approachable and intellectual group of consultants.
Be part of One Extended Family. We bond beyond work sports gettogethers common interests etc. Work in a very enriching environment with Open Culture Flat Organization and Excellent Peer Group
SKILLS: Familiarity with configuration management tools (Ansible, Chef, Puppet). Experience with monitoring tools (Prometheus, Grafana, Nagios). Strong analytical and problem-solving skills. Excellent communication and collaboration skills. Experience with database management (SQL, NoSQL). Knowledge of Infrastructure as Code (IaC) using tools like Terraform or Pulumi. Familiarity with Agile/Scrum methodologies. Certification in relevant technologies (e.g., AWS Certified DevOps Engineer) is a plus.