About the role:
We are seeking a production support Site Reliability Engineer (SRE) with strong automation skills to join our dynamic team. The ideal candidate will be responsible for ensuring the reliability availability and performance of our production systems while driving automation and operational excellence.
Key responsibilities:
- Provide day-to-day operational support for production environments ensuring high availability and reliability of critical services.
- Develop maintain and enhance automation scripts and tools using Bash Python and Ansible to streamline operational tasks and incident response.
- Monitor system performance proactively identify issues and implement solutions to prevent service disruptions.
- Collaborate with development QA and infrastructure teams to implement best practices for deployment monitoring and incident management.
- Participate in on-call rotation and respond to production incidents performing root cause analysis and driving resolution.
- Maintain and improve configuration management CI/CD pipelines and infrastructure as code practices.
- Document operational processes troubleshooting steps and automation workflows.
Required skills and experience:
- Proven experience in a production support or SRE role within a complex high-availability environment.
- Strong automation skills with proficiency in Bash Python and Ansible.
- Experience with monitoring and alerting tools (e.g. Prometheus Grafana Elastic stack Datadog).
- Solid understanding of Linux/Unix systems administration and troubleshooting.
- Familiarity with cloud platforms (e.g. AWS) and containerisation technologies (e.g. Docker Kubernetes).
- Experience with configuration management and infrastructure as code tools (e.g. Terraform CloudFormation).
- Knowledge of networking fundamentals security best practices and incident management processes.
- Excellent problem-solving skills attention to detail and ability to work under pressure.
- Strong communication and collaboration skills.
Desirable skills:
- Experience with version control systems (e.g. Git).
- Familiarity with Agile methodologies and DevOps culture.
- Exposure to database administration and troubleshooting (e.g. MySQL PostgreSQL Oracle).
- Scripting or automation experience with other languages (e.g. Go Ruby).
We are proud to be an equal opportunity employer. We do not discriminate against individuals on the basis of race gender age citizenship religion sexual orientation gender identity or expression disability or any other legally protected factor. We value the unique talents of all our people who come from diverse backgrounds with different personal experiences and points of view and we are committed to providing an environment of mutual respect.
Additional Information
This job description is only describing the main activities within a certain role and is not exhaustive. It does not prevent to add more tasks projects.
Required Experience:
IC
About the role:We are seeking a production support Site Reliability Engineer (SRE) with strong automation skills to join our dynamic team. The ideal candidate will be responsible for ensuring the reliability availability and performance of our production systems while driving automation and operatio...
About the role:
We are seeking a production support Site Reliability Engineer (SRE) with strong automation skills to join our dynamic team. The ideal candidate will be responsible for ensuring the reliability availability and performance of our production systems while driving automation and operational excellence.
Key responsibilities:
- Provide day-to-day operational support for production environments ensuring high availability and reliability of critical services.
- Develop maintain and enhance automation scripts and tools using Bash Python and Ansible to streamline operational tasks and incident response.
- Monitor system performance proactively identify issues and implement solutions to prevent service disruptions.
- Collaborate with development QA and infrastructure teams to implement best practices for deployment monitoring and incident management.
- Participate in on-call rotation and respond to production incidents performing root cause analysis and driving resolution.
- Maintain and improve configuration management CI/CD pipelines and infrastructure as code practices.
- Document operational processes troubleshooting steps and automation workflows.
Required skills and experience:
- Proven experience in a production support or SRE role within a complex high-availability environment.
- Strong automation skills with proficiency in Bash Python and Ansible.
- Experience with monitoring and alerting tools (e.g. Prometheus Grafana Elastic stack Datadog).
- Solid understanding of Linux/Unix systems administration and troubleshooting.
- Familiarity with cloud platforms (e.g. AWS) and containerisation technologies (e.g. Docker Kubernetes).
- Experience with configuration management and infrastructure as code tools (e.g. Terraform CloudFormation).
- Knowledge of networking fundamentals security best practices and incident management processes.
- Excellent problem-solving skills attention to detail and ability to work under pressure.
- Strong communication and collaboration skills.
Desirable skills:
- Experience with version control systems (e.g. Git).
- Familiarity with Agile methodologies and DevOps culture.
- Exposure to database administration and troubleshooting (e.g. MySQL PostgreSQL Oracle).
- Scripting or automation experience with other languages (e.g. Go Ruby).
We are proud to be an equal opportunity employer. We do not discriminate against individuals on the basis of race gender age citizenship religion sexual orientation gender identity or expression disability or any other legally protected factor. We value the unique talents of all our people who come from diverse backgrounds with different personal experiences and points of view and we are committed to providing an environment of mutual respect.
Additional Information
This job description is only describing the main activities within a certain role and is not exhaustive. It does not prevent to add more tasks projects.
Required Experience:
IC
View more
View less