Site Reliability Engineer, Technical Referent

DLocal

Job Location:

Rome - Italy

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Why should you join dLocal

dLocal enables the biggest companies in the world to collect payments in 40 countries in emerging markets. Global brands rely on us to increase conversion rates and simplify payment expansion effortlessly. As both a payments processor and a merchant of record where we operate we make it possible for our merchants to make inroads into the worlds fastest-growing emerging markets.

By joining us you will be a part of an amazing global team that makes it all happen. Being a part of dLocal means working with 1000 teammates from 30 different nationalities and developing an international career that impacts millions of peoples daily lives. We are builders we never run from a challenge we are customer-centric and if this sounds like you we know you will thrive in our team.

Whats the opportunity

We are looking for a Site Reliability Engineer (SRE) to join our team! As our Site Reliability Engineer (SRE) you will be focused on the design implementation and continuous maintenance of our centralized observability platform using OpenTelemetry (OTEL) as its will be part of a talented team that works on mission-critical applications with big customers like Netflix Amazon Nike Facebook & more!

As a Site Reliability Engineer you are always expected to ask the necessary questions:

What data do we need to understand how our systems are performing

How do we collect this data

What patterns are we looking for in the data and what do they mean

Who should be notified when a certain system is not working properly

Do we have any systems that we need more data for

An SRE engineer designs systems and processes to answer the questions above and to provide automated support and response where possible.

What will you do

Own OpenTelemetry Pipelines: Design implement and maintain observability pipelines across the three main signalslogs metrics and tracesensuring standardized scalable and efficient data ingestion. Optimize ingestion strategies to balance cost performance and usability.
Empower Engineering Teams: Build self-service automation and tooling that enables development teams to instrument and leverage observability without requiring manual intervention from the SRE team. Drive adoption of best practices while ensuring teams own their telemetry.
Support Incident Management: Be the Engineering side of our Incident Management Team designing the processes playbooks checklists and automations for them and other engineers to follow during an incident.
Collaborate Across Teams: Interact with members from almost all teams across the business to understand their monitoring alerting and SLO / SLA requirements and design systems and processes that ensure we meet or exceed these requirements. Influence architectural decisions during initial design stages to ensure resiliency and scale at the outset of software development.
Automate Observability Infrastructure: Leverage Infrastructure-as-Code (IaC) to provision and manage monitoring tools alerting rules and our observability configurations across OTEL Pipelines.
Define Baseline Observability Standards: Design base level requirements for new and existing services to ensure that all dLocal infrastructure and code are monitored consistently and accurately at a basic level.
Own Technical and Security Health: Take full ownership of dLocals infrastructure reliability ensuring adherence to key availability and security KPIs.
Optimize Alerting Systems: Continuously refine alerting signals to minimize noise and ensure them are always actionable reducing fatigue and improving response efficiency.

Which skill do you need

Over 4 years of experience as SRE Engineer or in a very similar role more focused on observability.
Expertise in Kubernetes including its core components deployment methodologies and monitoring best practices.
Some understanding of OpenTelemetry including setting up OTEL collectors instrumentation and pipeline optimization.
Proficiency with monitoring and logging tools such as Grafana Prometheus Loki New Relic or Datadog.
Hands-on experience with IaC tools (Terraform) and GitOps CI/CD solutions (ArgoCD GitHub Actions or similar).
Experience integrating incident management platforms (PagerDuty Jira) with automated alerting workflows.
Strong scripting abilities (Python Go or similar) for automating observability tasks.
A problem-solving mindset with the ability to collaborate across multi-functional teams to drive reliability improvements.

You will stand out if you have:

Cloud experience especially AWS and ECS-based workloads.
Experience managing observability pipelines at scale in high-throughput environments.
Familiarity with Configuration-as-Code (Ansible Chef or SaltStack) for managing configurations across legacy instances.
Database performance monitoring experience particularly in large-scale distributed environments.

What do we offer

Besides the tailored benefits we have for each country dLocal will help you thrive and go that extra mile by offering you:

- Flexibility: we have flexible schedules and we are driven by performance.

- Fintech industry: work in a dynamic and ever-evolving environment with plenty to build and boost your creativity.

- Referral bonus program: our internal talents are the best recruiters - refer someone ideal for a role and get rewarded.

- Learning & development: get access to a Premium Coursera subscription.

- Language classes: we provide free English Spanish or Portuguese classes.

- Social budget: youll get a monthly budget to chill out with your team (in person or remotely) and deepen your connections!

- dLocal Houses: want to rent a house to spend one week anywhere in the world coworking with your team Weve got your back!

Flexibility in how you work: We focus on impact and productivity over fixed hours. This means our teams have flexible schedules and depending on your role and location you will combine selfmanaged focus time with moments of inperson connection in our collaboration hubs.

What happens after you apply

Our Talent Acquisition team is invested in creating the best candidate experience possible so dont worry you will definitely hear from us. We will review your CV and keep you posted by email at every step of the process!

Also you can check out ourwebpage Linkedin and Youtubefor more about dLocal!

We may use artificial intelligence (AI) tools to support parts of the hiring process such as reviewing applications analyzing resumes or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed please contact us.

Required Experience:

Why should you join dLocaldLocal enables the biggest companies in the world to collect payments in 40 countries in emerging markets. Global brands rely on us to increase conversion rates and simplify payment expansion effortlessly. As both a payments processor and a merchant of record where we opera...