Head of Site Reliability Engineer

DLocal

Job Location:

Barcelona - Brazil

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Why should you join dLocal

dLocal enables the biggest companies in the world to collect payments in 40 countries in emerging markets. Global brands rely on us to increase conversion rates and simplify payment expansion effortlessly. As both a payments processor and a merchant of record where we operate we make it possible for our merchants to make inroads into the worlds fastest-growing emerging markets.

By joining us you will be a part of an amazing global team that makes it all happen. Being a part of dLocal means working with 1000 teammates from 30 different nationalities and developing an international career that impacts millions of peoples daily lives. We are builders we never run from a challenge we are customer-centric and if this sounds like you we know you will thrive in our team.

Whats the opportunity

As we continue to grow globally and increase the complexity and scale of our systems were strengthening our focus on Site Reliability Engineering.

We are looking for a Head of Site Reliability Engineering (SRE) to lead the SRE division and take endtoend ownership of reliability across our platform.

In this role you will:

Define and drive the SRE strategy vision and roadmap for dLocal.
Lead and grow a multiregion SRE organization including SRE Technical Referents and engineers at different seniority levels.
Partner closely with Product Engineering and Platform leaders to ensure we can scale safely with clear reliability guardrails and strong operational excellence.

This is a highimpact handson leadership role reporting to VP of Cloud Platform for someone who can move comfortably between strategy architecture and execution while coaching and empowering a senior distributed team.

What will you do

Strategy and leadership

Own the global reliability strategy for dLocals platforms and services aligning SRE goals with company and product objectives.
Define and socialize SRE standards and principles (SLIs/SLOs/SLAs error budgets production readiness incident management practices capacity planning etc.).
Lead the SRE division: set org structure define roles and scopes and drive hiring performance and career development.
Build a culture of high ownership continuous improvement and datadriven decisions across all reliabilityrelated work.

Reliability operations and observability

Ensure our most critical systems meet or exceed availability latency and performance targets.
Oversee and continuously evolve incident management (oncall strategy incident response communication postmortems followups and KPIs).
Own the strategy for observability and monitoring (metrics logs traces) and alerting across all environments including tool selection standards and adoption.
Drive operational excellence: reduce toil via automation improve deployment safety and standardize production practices across teams.

Architecture and technical direction

Partner with Architecture Platform and Product Engineering leaders to define reliable scalable architectures for our core systems and critical flows.
Guide the adoption of best practices in automation and Infrastructure as Code (IaC) across SRE and dependent engineering teams.
Sponsor and oversee large crossteam reliability programs such as major observability migrations resilience testing frameworks or reliability improvements for key products.
Provide senior technical leadership on capacity planning performance engineering resilience and disaster recovery.

People and crossfunctional collaboration

- Lead mentor and coach SRE Leader Technical Referents and senior ICs helping them grow in both technical depth and leadership.
- Collaborate closely with:
  - Product & Engineering to balance feature delivery and reliability.
  - Security Cloud Platform and Infrastructure to ensure secure and robust foundations.
  - Business stakeholders (e.g. Operations Support Commercial) to align on reliability expectations and SLAs.
- Communicate clearly about risk tradeoffs and priorities to both technical and nontechnical audiences including senior leadership.

Which skill do you need

Musthave

Solid experience leading SRE / Production Engineering / Platform teams in highavailability highscale environments (fintech payments or similarly critical domains is a plus).
Proven track record managing managers and senior ICs building and scaling distributed technical teams.
Deep handson expertise in:
- Reliability engineering: SLIs/SLOs error budgets capacity planning resilience and disaster recovery.
- Incident management: oncall models incident response postmortems continuous improvement of incident processes.
- Observability and monitoring: metrics logs traces alerting strategies and ecosystem of tools.
- Automation and IaC: strong familiarity with modern CI/CD pipelines configuration management and infrastructure as code.
Ability to shape technical strategy translate it into a clear roadmap and ensure consistent execution across multiple teams.
Excellent communication and influencing skills; comfortable driving alignment across Engineering Product and nontechnical stakeholders.
Strong analytical and problemsolving skills able to operate effectively in ambiguous fastchanging contexts.
Professional proficiency in English; comfortable working in a global multitimezone multicultural environment.

Nice to have

Experience in payments / fintech or other regulated missioncritical industries.
Handson background as an SRE Senior/Staff Engineer or Platform Engineer before moving into leadership.
Experience implementing or maturing:
- Centralized observability platforms and unified alerting strategies.
- Standardized production readiness reviews and reliability signoff processes.
- Chaos engineering / resilience testing practices.

What do we offer

Besides the tailored benefits we have for each country dLocal will help you thrive and go that extra mile by offering you:

- Flexibility: we have flexible schedules and we are driven by performance.

- Fintech industry: work in a dynamic and ever-evolving environment with plenty to build and boost your creativity.

- Referral bonus program: our internal talents are the best recruiters - refer someone ideal for a role and get rewarded.

- Learning & development: get access to a Premium Coursera subscription.

- Language classes: we provide free English Spanish or Portuguese classes.

- Social budget: youll get a monthly budget to chill out with your team (in person or remotely) and deepen your connections!

- dLocal Houses: want to rent a house to spend one week anywhere in the world coworking with your team Weve got your back!

Flexibility in how you work: We focus on impact and productivity over fixed hours. This means our teams have flexible schedules and depending on your role and location you will combine selfmanaged focus time with moments of inperson connection in our collaboration hubs.

What happens after you apply

Our Talent Acquisition team is invested in creating the best candidate experience possible so dont worry you will definitely hear from us. We will review your CV and keep you posted by email at every step of the process!

Also you can check out ourwebpage Linkedin and Youtubefor more about dLocal!

We may use artificial intelligence (AI) tools to support parts of the hiring process such as reviewing applications analyzing resumes or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed please contact us.

Required Experience:

Director

Why should you join dLocaldLocal enables the biggest companies in the world to collect payments in 40 countries in emerging markets. Global brands rely on us to increase conversion rates and simplify payment expansion effortlessly. As both a payments processor and a merchant of record where we opera...