drjobs Software Delivery - Site Reliability Engineer

Software Delivery - Site Reliability Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Cupertino, CA - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

What Youll Do:- Ensure System Reliability: Design build and maintain robust scalable and observable systems for our core software delivery services.- Automate: Reduce operational toil by developing automation and tooling to prevent and rapidly resolve production issues.- Improve Incident Response: Own and refine our incident management processes to ensure high availability.- Collaborate with Engineers: Partner with development teams to create elegant high-quality solutions that support the entire workflow from source code to customer release.- Improve and Modernize Systems: Use a proactive approach to identify and eliminate technical debt to enhance long-term reliability and Our Team: We are a team dedicated to engineering excellence reusable design and simplicity. We foster a supportive growth-focused culture where we mentor each other and work together to build resilient high-quality Youll Bring: We know that great talent comes from a variety of backgrounds and we encourage you to apply even if you dont meet every single requirement. The most important thing is a deep commitment to building reliable systems and strong collaboration with team members across different timezones.


  • Experience as a Site Reliability Engineer DevOps Engineer or Software Engineer focused on infrastructure in a large-scale distributed environment.
  • Strong software development skills in a language like Swift Go or Python and a high degree of comfort with shell scripting (Bash).
  • Hands-on experience building and managing systems with container orchestration tools (Kubernetes Docker).
  • Deep understanding of networking (TCP/IP DNS HTTP) and experience using observability tools (monitoring logging tracing) to diagnose complex issues.
  • Excellent problem-solving and communication skills with a strong sense of ownership and drive.


  • Proven experience leading initiatives to reduce technical debt refactor systems or improve performance and latency.
  • Expertise in performance analysis and capacity planning for global distributed systems.
  • Experience with large-scale distributed databases (e.g. Cassandra FoundationDB) or messaging systems (e.g. Kafka).
  • Demonstrated ability to lead incident response for high-impact outages.
  • Familiarity with using Generative AI (GenAI) or Large Language Models (LLMs) to accelerate operational tasks such as automating runbooks generating scripts or analyzing incident data.

Employment Type

Full-Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.