Principal Site Reliability Engineer

UiPath

Not Interested
Bookmark
Report This Job

profile Job Location:

Tokyo - Japan

profile Monthly Salary: Not Disclosed
Posted on: 8 hours ago
Vacancies: 1 Vacancy

Department:

Engineering

Job Summary

Life at UiPath

The people at UiPath believe in the transformative power of automation to change how the world works. Were committed to creating category-leading enterprise software that unleashes that power.

To make that happen we need people who are curious self-propelled generous and genuine. People who love being part of a fast-moving fast-thinking growth company. And people who careabout each other about UiPath and about our larger purpose.

Could that be you

Agentic(エージェンティックの最先端で一緒に働いてみませんか

UiPathはエンドツーエンドの業務自動化を通じてこれまで日本企業の効率化と変革を支えてきました今我々が注力しているのはエージェンティックオートメーションAIエージェントRPAのロボット人

を連携させて企業全体の業務を安全かつ安定的に自動化することです

UiPath株式会社は本社直下のリージョンに昇格し日本を最重要拠点と位置づける戦略のもと日本から世界へソリューションを発信することを目指していますUiPathは好奇心旺盛で自ら進んで動けるフットワークの軽い人材を求めていますビジネスのスピードや変化を喜びとし互いを思いやりともに成長し続けられる仲間が必要ですUiPathでエージェンティックオートメーションを実現し共に社会を変革しましょう

Role Overview

This is a high-impact principal level role designed for an engineer who excels in the heat of the moment. Operating with a high degree of autonomy you will take operational leadership to restore the stability of UiPaths large-scale distributed services blending deep technical SRE expertise with the authoritative presence of an Incident Commander.

You will partner closely with platform infrastructure and application teams globally to improve service availability reduce operational toil and ensure our systems scale reliably under real-world load and failure conditions.

You will act as the Japan regional owner for SRE standards and maintain a close partnership and functional alignment with UiPaths Global SRE organization. You will also own service reliability observability automation and continuous improvement initiatives for the region.

You will report primarily to Senior Director of Japan and functionally to Vice President - SRE based in U.S. You will also act in the managerial capacity with another team member reporting to you.

What Youll Be Working On

1. Incident Command & Tactical Response

Lead Incident Command: Act as the primary Incident Commander for high-stakes technical events. Establish command and control orchestrate cross-functional response efforts (Compute Network Storage Database) and maintain a common operating picture for all stakeholders.

Live Site Troubleshooting: Serve as a key escalation point for complex issues. Use your deep understanding of service topology and dependencies to diagnose grey failure and resolve disruptions promptly.

Executive Communication: Own the communication life cycle. Deliver real-time executive-level briefings during active incidents translating technical jargon into clear business impact and recovery timelines for leadership.

2. Prevention & Reliability Engineering

Post-Incident Evolution: Lead thorough retrospectives and RCAs. Beyond just documenting what happened you will drive and influence the discovery and implementation of automated self-healing solutions to ensure the same issue never occurs twice.

Observability: Define track and improve service health through promoting well-designed SLIs and SLOs. Influence and implement proactive monitoring dashboards and early-warning alerts to identify performance bottlenecks before they trigger an incident.

Toil Automation: Design and implement automation to reduce manual intervention during incidents and routine operations. Apply engineering rigor to operational workflows to eliminate repetitive and error-prone tasks.

Service Resilience: Understand the know-how to test service behavior under load including degradation modes scaling characteristics and dependency failures. Ensure backup restore and disaster recovery capabilities are implemented tested and maintained.

3. Service Design & Cross-functional Leadership

Architectural Partnership: Partner with development teams to champion high availability and readiness of the services and promote best practices on reliability resilience and operability.

Team Mentorship: Advocate for SRE best practices. Mentor and support other engineers helping raise the overall incident response and reliability maturity of the organization.

What Youll Bring to the Team

Experience: 7 years in SRE Cloud Operations or a related technical field with at least 3 years in a lead responder or command-oriented role.

Command Presence: Demonstrated ability to remain calm focused and decisive under extreme pressure. You can lead a room of diverse stakeholders and drive technical conversations to successful outcomes.

Forensics & Investigation: Skills in analyzing system artifacts network and performance dashboard data to lead the multi-disciplinary audience to appropriate root cause areas of service failures.

Technical Breadth: Strong proficiency in Python or Go and a holistic understanding of distributed systems Kubernetes and cloud infrastructure (Preferably Azure).

Observability Expertise: Deep experience with leveraging Prometheus/Grafana Open Telemetry or any other equivalent 3rd party Observability stack.

Availability: Willingness to participate in the on-call rotation as an Incident Commander for high-severity issues.

Nice to have

Command Frameworks: Familiarity with structured command systems (such as the Incident Command System - ICS) used in crisis management.

LLM Ops: Experience using LLMs or AI-driven detection systems to solve reliability and capacity challenges in GPU-heavy high-performance computing environments.

AI Tooling: Champion the use of AI tools and LLM-powered agents to improve SRE pillars including but not limited to reducing operational toil.

Event-Driven Remediation: Proven history of building self-healing infrastructure via Terraform A zure Service Operator or any other equivalent solutions.

Working Hours & Language Skills

Working Hours: The role follows a standard work schedule starting at 8:00 a.m. Flexibility may be required to support on-call rotations and respond to incidents particularly those affecting customers in Japan.

Language Skills: Strong proficiency in English for effective communication with global functional team members combined with Japanese proficiency to clearly convey incident details root causes and remediation plans to customers and local stakeholders in the Japanese market.

Maybe you dont tick all the boxes abovebut still think youd be great for the job Go ahead apply anyway. Please. Because we know that experience comes in all shapes and sizesand passion cant be learned.

Many of our roles allow for flexibility in when and where work gets done. Depending on the needs of the business and the role the number of hybrid office-based and remote workers will vary from team to team. Applications are assessed on a rolling basis and there is no fixed deadline for this requisition. The application window may change depending on the volume of applications received or may close immediately if a qualified candidate is selected.

We value a range of diverse backgrounds experiences and ideas. We pride ourselves on our diversity and inclusive workplace that provides equal opportunities to all persons regardless of age race color religion sex sexual orientation gender identity and expression national origin disability neurodiversity military and/or veteran status or any other protected classes. Additionally UiPath provides reasonable accommodations for candidates on request and respects applicants privacy rights. To review these and other legal disclosures visit our .


Required Experience:

Staff IC

Life at UiPathThe people at UiPath believe in the transformative power of automation to change how the world works. Were committed to creating category-leading enterprise software that unleashes that power.To make that happen we need people who are curious self-propelled generous and genuine. People...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting

About Company

Company Logo

We deliver the most advanced Enterprise #RPA Platform, built for business and IT. As you strive to benefit in the Automation First Era, your digital transformation accelerates here. More than 2,750 enterprise customers and government agencies use UiPath's Enterprise RPA platform to r ... View more

View Profile View Profile