drjobs Senior Site Reliability Engineer

Senior Site Reliability Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

London - UK

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Gorgias is the conversational AI platform for ecommerce that drives sales and resolves support inquiries. Trusted by over 15000 ecommerce brands Gorgias supports growing independent shops to globally recognizable brands.

Built for Shopify and powered by advanced ecommerce integrations Gorgiass conversational AI understands your brand tools policies and customers to drive personalized 1-to-1 conversations from editing orders and initiating returns to making product recommendations. Gorgias where every customer interaction feels personal support becomes sales and conversations shape success.

Relocate to either: Paris Lisbon or Belgrade. Relocation and Visa provided.

About The SRE Team


We are seeking a highly skilled and experienced Senior Site Reliability Engineer (SRE) to join our team. As an SRE at Gorgias you will play a crucial role in ensuring the reliability scalability and performance of our systems enabling the seamless delivery of our products and services.

The SRE team at Gorgias maintains the core infrastructure and services that make up the heart of our product. We have the privilege to work with high throughput systems and TB-scale data stores serving billions of queries per day most with sub millisecond response times.

We also design and maintain the software delivery stack offering features such as metrics-based canary rollout strategies to all internal development teams.

We currently have a team of 9 Senior and Staff SREs operating together globally with aim to be 12 in the near term. We focus on scalable methods to provide the largest impact across the organization.

Some achievements were proud of:

  • Partitioned multi-TB tables inPostgresto reduce Vacuum time by 5x

  • For partitioning we studied the problem the partitioning strategy analyzed all queries to avoid bad surprises utilized Debezium and Kafka to do a live copy and accomplished it with less than 20 mins maintenance window and no data loss

  • Split PostgreSQL connections proxy in multiple pools to guarantee quotas per service of our product allowing sub-systems that heavily hit the database to be contained and not create a large incident blast radius

  • For connections proxying we had to go deeper into the BE to propose solutions coded part of the fix in the backend provided the path and helped teams migrate to the new methodology. In the end successfully eliminating incidents due to DB connections starvation

  • Worked with all product-engineering teams to accomplish SOC2 certification ran a Hackerone program refactored our whole incident management with Rootly for better visibility and resolution time and improved our overall security posture

  • To keep the lights on the team is constantly working on upgrading our self-hostedPostgresand RabbitMQ alongside other critical infrastructure components with minimal down time and high accuracy

What You Will Do:

  • Manage multi-TBPostgreSQLclusters in the public cloud optimize parameters storage settings and data structure

  • Operate RabbitMQ and Redis with tens of thousands of operations per second

  • Manage 10 full featured GKE clusters worldwide 10k Tenants

  • Adopt new stack of: Kafka Debezium Apache Flink

  • Facilitate rollout strategies at scale with Gitlab CI and ArgoCD

  • Roll out best practices around Kubernetes/Helm/Operators SLIs/SLOs Incident Management ObservabilitySecurity and Disaster Recovery to all Product-Engineering teams and drive adoption by them

  • Automate complex infrastructure pieces for our worldwide footprint with best practices IaC with TF strong scripting with Python/Golang

What You Should Have:

  • Experience with cloud-native web systems at scale

  • Bachelors degree in Computer Science or equivalent work experience.

  • 5 years experience as a Site Reliability Engineer or similar role with a focus on maintaining high-performance scalable and reliable high-throughput web systems.

  • Proficiency in using Kubernetes for container orchestration and management.

  • 5 years experience with Cloud Providers (AWS GCP) and a deep understanding of cloud services and architectures.

  • Proficient in scripting and programming languages such as Python Bash Go or NodeJS.

  • Comfortable and confident in Linux systems and the command line.

  • Solid understanding of infrastructure as code (IaC) principles and experience with tools like Terraform.

  • Experience with continuous integration and deployment (CI/CD) pipelines.

  • Excellent problem-solving and troubleshooting skills.

  • Strong communication and collaboration skills with the ability to work effectively in a team environment.

Bonus Points If You Have:

  • Certification in Kubernetes (e.g. Certified Kubernetes Administrator - CKA).

  • Certification in a Cloud Provider platform (e.g. AWS Certified Solutions Architect Google Cloud Professional Cloud Architect).

  • Experience in managing and optimizing PostgreSQL databases.

Company Benefits and Perks

Diversity & Inclusion at Gorgias
We celebrate diversity and are committed to creating an inclusive environment for all employees. We welcome applicants of all backgrounds experiences and perspectives. At Gorgias we believe that diverse teams drive innovation and better decision-making. We do not discriminate based on race color religion gender identity sexual orientation disability age or any other protected status.

If you need accommodations to participate in the application or interview process perform essential job functions or access other employment benefits please contact us at
. Lets grow together!


Required Experience:

Senior IC

Employment Type

Full-Time

Department / Functional Area

Engineering

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.