Senior Site Reliability Engineer Ireland

Dublin - Ireland

Monthly Salary: Not Disclosed

Posted on: 2 hours ago

Vacancies: 1 Vacancy

Department:

Software Engineering

Job Summary

Who Youll Work For

We are seeking an experienced and analytically-minded Site Reliability Engineer to join our organisation on a permanent remote basis from this role you will be instrumental in building deploying and operating critical production systems with a steadfast commitment to scalability reliability observability and security. You will work collaboratively with cross-functional teams to ensure our infrastructure remains resilient efficient and future-ready. This is an excellent opportunity for a detail-oriented professional who thrives in a dynamic environment and is passionate about solving complex infrastructure challenges.

What Youll Do

Design build and deploy production systems with a focus on scalability reliability observability and performance ensuring systems meet stringent security standards
Develop and maintain comprehensive automation solutions to eliminate toil and streamline operational efficiency across production environments
Proactively monitor production systems establish intelligent alerting strategies and implement automated incident response mechanisms to minimise downtime
Create and maintain detailed incident response runbooks; conduct thorough postmortem analyses following incidents to identify root causes and prevent recurrence
Collaborate with software engineering teams to identify and resolve infrastructural bottlenecks designing innovative solutions that enhance product deployment workflows
Manage and optimise monitoring infrastructure using industry-standard tools ensuring comprehensive visibility across all systems
Plan communicate and execute maintenance windows on production systems with minimal disruption to service availability
Triage platform and infrastructural issues with decisiveness and analytical rigour; engage with third-party vendors and support teams as required
Deploy new systems and updates in a staged risk-managed manner ensuring safe and incremental rollouts
Survey and adopt best practices in infrastructure and platform management to maintain secure scalable and fault-tolerant systems
Study the design and implementation details of open-source systems to enhance troubleshooting capabilities and accelerate issue resolution
Work transparently with stakeholders to communicate system status planned maintenance and infrastructure improvements

#LI-EO1

#automation #Ansible #Terraform #observability #Prometheus #Grafana #cloud platforms #AWS #GCP #Azure #container #orchestration #Kubernetes #Docker #CI/CD #Jenkins #GitLab

Qualifications :

**Essential Requirements:**

Bachelors degree in Computer Science Engineering or equivalent professional experience (5 years in a related infrastructure or systems role)
Proficiency in one or more programming languages: Go Python or bash shell scripting with the ability to implement medium-complexity automation workflows
Strong knowledge of Linux or UNIX from both administration and debugging perspectives
Hands-on experience operating software systems infrastructure and complex applications at scale in production environments
Demonstrated expertise in infrastructure-as-code principles and practices
Strong problem-solving and software troubleshooting skills with a methodical analytical approach
Experience with server provisioning particularly from storage and networking perspectives
Proven ability to work collaboratively within cross-functional teams and communicate technical concepts clearly
Experience with incident response postmortem analysis and continuous improvement methodologies

**Desirable Skills and Experience:**

Experience with container orchestration platforms particularly Kubernetes
Hands-on experience with Docker and virtualisation technologies
Proficiency in managing monitoring stacks including Prometheus and Grafana
Experience with CI/CD systems such as GitLab tools or Spinnaker
Knowledge of infrastructure-as-code frameworks particularly Terraform
Experience managing databases such as PostgreSQL or equivalent relational database management systems
Experience with artifact repositories and Docker registries
Familiarity with cloud platforms (Google Cloud Platform Amazon Web Services or Microsoft Azure)
Understanding of distributed systems architecture and principles
Experience with performance tuning and system optimisation
Knowledge of security best practices in infrastructure and systems design
On-call support experience and comfort with incident response responsibilities

Additional Information :

Arista stands out as an engineering-centric company. Our leadership including founders and engineering managers are all engineers who understand sound software engineering principles and the importance of doing things right.

We hire globally into our diverse team. At Arista engineers have complete ownership of their projects. Our management structure is flat and streamlined and software engineering is led by those who understand it best. We prioritize the development and utilization of test automation tools.

Our engineers have access to every part of the company providing opportunities to work across various domains. Arista is headquartered in Santa Clara California with development offices in Australia Canada India Ireland and the US. We consider all our R&D centers equal in stature.

Join us to shape the future of networking and be part of a culture that values invention quality respect and fun.

Remote Work :

Employment Type :

Full-time

Who Youll Work ForWe are seeking an experienced and analytically-minded Site Reliability Engineer to join our organisation on a permanent remote basis from this role you will be instrumental in building deploying and operating critical production systems with a steadfast commitment to scalability r...

Who Youll Work For

What Youll Do

Design build and deploy production systems with a focus on scalability reliability observability and performance ensuring systems meet stringent security standards
Develop and maintain comprehensive automation solutions to eliminate toil and streamline operational efficiency across production environments
Proactively monitor production systems establish intelligent alerting strategies and implement automated incident response mechanisms to minimise downtime
Create and maintain detailed incident response runbooks; conduct thorough postmortem analyses following incidents to identify root causes and prevent recurrence
Collaborate with software engineering teams to identify and resolve infrastructural bottlenecks designing innovative solutions that enhance product deployment workflows
Manage and optimise monitoring infrastructure using industry-standard tools ensuring comprehensive visibility across all systems
Plan communicate and execute maintenance windows on production systems with minimal disruption to service availability
Triage platform and infrastructural issues with decisiveness and analytical rigour; engage with third-party vendors and support teams as required
Deploy new systems and updates in a staged risk-managed manner ensuring safe and incremental rollouts
Survey and adopt best practices in infrastructure and platform management to maintain secure scalable and fault-tolerant systems
Study the design and implementation details of open-source systems to enhance troubleshooting capabilities and accelerate issue resolution
Work transparently with stakeholders to communicate system status planned maintenance and infrastructure improvements

#LI-EO1

#automation #Ansible #Terraform #observability #Prometheus #Grafana #cloud platforms #AWS #GCP #Azure #container #orchestration #Kubernetes #Docker #CI/CD #Jenkins #GitLab

Qualifications :

**Essential Requirements:**

Bachelors degree in Computer Science Engineering or equivalent professional experience (5 years in a related infrastructure or systems role)
Proficiency in one or more programming languages: Go Python or bash shell scripting with the ability to implement medium-complexity automation workflows
Strong knowledge of Linux or UNIX from both administration and debugging perspectives
Hands-on experience operating software systems infrastructure and complex applications at scale in production environments
Demonstrated expertise in infrastructure-as-code principles and practices
Strong problem-solving and software troubleshooting skills with a methodical analytical approach
Experience with server provisioning particularly from storage and networking perspectives
Proven ability to work collaboratively within cross-functional teams and communicate technical concepts clearly
Experience with incident response postmortem analysis and continuous improvement methodologies

**Desirable Skills and Experience:**

Experience with container orchestration platforms particularly Kubernetes
Hands-on experience with Docker and virtualisation technologies
Proficiency in managing monitoring stacks including Prometheus and Grafana
Experience with CI/CD systems such as GitLab tools or Spinnaker
Knowledge of infrastructure-as-code frameworks particularly Terraform
Experience managing databases such as PostgreSQL or equivalent relational database management systems
Experience with artifact repositories and Docker registries
Familiarity with cloud platforms (Google Cloud Platform Amazon Web Services or Microsoft Azure)
Understanding of distributed systems architecture and principles
Experience with performance tuning and system optimisation
Knowledge of security best practices in infrastructure and systems design
On-call support experience and comfort with incident response responsibilities

Additional Information :

Join us to shape the future of networking and be part of a culture that values invention quality respect and fun.

Remote Work :

Employment Type :

Full-time

Key Skills

Apply Now

About Company

Arista Networks

Arista Networks is an industry leader in data-driven, client-to-cloud networking for large data center, campus and routing environments. What sets us apart is our relentless pursuit of innovation. We leverage the latest advancements in cloud computing, artificial intelligence, and sof ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click