The people at UiPath believe in the transformative power of automation to change how the world works. Were committed to creating category-leading enterprise software that unleashes that power.
To make that happen we need people who are curious self-propelled generous and genuine. People who love being part of a fast-moving fast-thinking growth company. And people who careabout each other about UiPath and about our larger purpose.
Could that be you
UiPath is seeking a Principal Site Reliability Engineer who excels across the full reliability stack not limited to a single silo. You will help define how reliability is architected scaled measured and automated across our large-scale cloud-native systems. This role requires broad technical judgment systems thinking and the ability to influence reliability outcomes across product and platform teams.
This role is about shaping how reliability works at UiPath not just firefighting outages or writing code. You will partner with engineering and platform teams to embed reliability into systems workflows and culture. You will help raise the bar for how we observe automate and ensure our systems scale reliably under real-world load and failure conditions.
You will own service reliability observability automation and continuous improvement initiatives and partners with our Romania and India based application teams as needed.
End-to-End Reliability Ownership - Define and evolve reliability strategy for distributed systems balancing availability performance velocity and cost through clear SLIs/SLOs and error budgets.
Incident Response & Operational Excellence - Lead and contribute to high-severity incidents drive structured troubleshooting under ambiguity and ensure durable systemic improvements.
Observability & Operational Insights - Define and promote strong observability practices so that service health and performance risks are visible and actionable.
Automation Tooling & Engineering Rigor - Automate manual operational work through tooling and self-service applying disciplined engineering practices.
Infrastructure Cloud & IaC - Drive reliable scalable cloud infrastructure using Infrastructure as Code and collaborate with platform teams on best practices.
Technical Leadership & Org Impact - Influence standards mentor senior engineers and elevate operational reliability across the organization.
Engineering & Reliability Experience
7 years of experience in SRE platform cloud or infrastructure engineering roles with a track record of improving reliability for production systems.
Demonstrated ability to define and operationalize SLIs SLOs and use frame works like error budget to align reliability with user impact and business goals.
System Thinking & Distributed Systems Fundamentals
Strong conceptual understanding of distributed systems performance bottlenecks failure modes and trade-offs inherent to large-scale systems.
Scripting & Tooling
Proficiency in at least one programming language (e.g. Python Go or similar) used to build automation internal tooling and reliability workflows.
Experience developing tools and automation to reduce operational toil and improve system reliability.
Cloud & Infrastructure Expertise
Hands-on experience working with one or more major cloud providers (Azure AWS GCP) with practical knowledge of networking deployments and scaling.
Experience with Infrastructure as Code (e.g. Terraform Pulumi) and container orchestration (e.g. Kubernetes) in production contexts.
Observability & Operational Practices
Proven experience with monitoring/observability stacks (metrics logs traces) and building meaningful dashboards and alerts that improve reliability signals.
Incident Response & Post-Incident Learning
Experience participating in and improving incident response blameless postmortems and implementing systemic fixes rather than symptomatic patches.
Collaboration & Influence
Ability to partner with product infrastructure and engineering teams to influence architecture and reliability practices without direct authority.
Nice to Have
Experience with chaos engineering resilience testing or performance optimization.
Exposure to Service mesh Reliability scoring frameworks or AIOps tooling.
#LI-VR1
Maybe you dont tick all the boxes abovebut still think youd be great for the job Go ahead apply anyway. Please. Because we know that experience comes in all shapes and sizesand passion cant be learned.
Many of our roles allow for flexibility in when and where work gets done. Depending on the needs of the business and the role the number of hybrid office-based and remote workers will vary from team to team. Applications are assessed on a rolling basis and there is no fixed deadline for this requisition. The application window may change depending on the volume of applications received or may close immediately if a qualified candidate is selected.
We value a range of diverse backgrounds experiences and ideas. We pride ourselves on our diversity and inclusive workplace that provides equal opportunities to all persons regardless of age race color religion sex sexual orientation gender identity and expression national origin disability neurodiversity military and/or veteran status or any other protected classes. Additionally UiPath provides reasonable accommodations for candidates on request and respects applicants privacy rights. To review these and other legal disclosures visit our .
Required Experience:
Staff IC
We deliver the most advanced Enterprise #RPA Platform, built for business and IT. As you strive to benefit in the Automation First Era, your digital transformation accelerates here. More than 2,750 enterprise customers and government agencies use UiPath's Enterprise RPA platform to r ... View more