Senior Site Reliability Engineer

Washington, AR - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

What Are We Looking For

Please note that under Federal & FedRAMP regulations hiring for this role is limited to US citizens only.

FedRamp employees may be subject to customer or third-party background checks up to and including secret clearance if required by their role at SentinelOne.

We are looking for a Senior Site Reliability Engineer (SRE) to join the Site Reliability Engineering team at SentinelOne. This organizations mission is to keep our uptime promise to our customers by ensuring we meet our SLOs/SLAs help our engineering teams ship software to our customers fast and with quality and ensure our customers are successful. We are looking to add a Senior SRE who has experience running incident post-mortems automating repetitive operational tasks improving alerting accuracy and building and refining processes that reduce downtime. You will work closely with cross-functional teams to lead reliability initiatives and bring best practices to our team.

We value good written communication skills data-driven decisions and a keen eye for continuous improvements. Youll help simplify have a passion for new ideas and know how to execute iteratively toward the final goal. We value candor and collaboration.

What Will You Do

Lead and execute incident management for production issues ensuring rapid recovery and root cause analysis
Improve and optimize the observability strategy..
Collaborate with application engineering teams to design and implement monitoring solutions that enhance our alerting capabilities and reduce noise
Develop and refine SLOs SLIs and SLAs that align with business objectives and customer expectations
Conduct post-incident review documenting findings and driving follow-up actions to prevent recurrence.
Mentor and support other engineers in incident response troubleshooting techniques and reliability best practices.

What skills and knowledge should you bring

5 years of experience in Site Reliability Engineering DevOps or a related field in cloud native environments
Strong expertise in incident management processes and the ability to lead complex troubleshooting efforts under pressure.
Experience with Kubernetes and container orchestration
Experience with industry standard observability stacks (Prometheus Grafana ELK OpenTelemetry etc).
Proficiency in Python and Bash scripting to improve operational workflows and incident response
Familiarity with modern CI/CD pipelines and DevOps practices
Excellent communication skills with demonstrated ability to lead and mentor engineers in reliability practices.

Why us

You will work on real-world problems and make an impact by protecting our customers from cyber threats. You will join a cutting-edge project and will be able to influence the architecture design and structure of our core platform. You will tackle extraordinary challenges and work with the very BEST in the industry.

You will be joining a cutting-edge company where you will tackle extraordinary challenges and work with the very best in the industry
Medical Vision Dental 401(k) Commuter Health and Dependent FSA
Unlimited PTO
Industry-leading gender-neutral parental leave
Paid company holidays
Paid sick time
Employee stock purchase program
Disability and life insurance
Employee assistance program
Gym membership reimbursement
Cell phone reimbursement
Numerous company-sponsored events including regular happy hours and team-building events

Required Experience:

Senior IC

What Are We Looking ForPlease note that under Federal & FedRAMP regulations hiring for this role is limited to US citizens only.FedRamp employees may be subject to customer or third-party background checks up to and including secret clearance if required by their role at SentinelOne.We are looking f...

What Are We Looking For

Please note that under Federal & FedRAMP regulations hiring for this role is limited to US citizens only.

FedRamp employees may be subject to customer or third-party background checks up to and including secret clearance if required by their role at SentinelOne.

What Will You Do

Lead and execute incident management for production issues ensuring rapid recovery and root cause analysis
Improve and optimize the observability strategy..
Collaborate with application engineering teams to design and implement monitoring solutions that enhance our alerting capabilities and reduce noise
Develop and refine SLOs SLIs and SLAs that align with business objectives and customer expectations
Conduct post-incident review documenting findings and driving follow-up actions to prevent recurrence.
Mentor and support other engineers in incident response troubleshooting techniques and reliability best practices.

What skills and knowledge should you bring

5 years of experience in Site Reliability Engineering DevOps or a related field in cloud native environments
Strong expertise in incident management processes and the ability to lead complex troubleshooting efforts under pressure.
Experience with Kubernetes and container orchestration
Experience with industry standard observability stacks (Prometheus Grafana ELK OpenTelemetry etc).
Proficiency in Python and Bash scripting to improve operational workflows and incident response
Familiarity with modern CI/CD pipelines and DevOps practices
Excellent communication skills with demonstrated ability to lead and mentor engineers in reliability practices.

Why us

You will be joining a cutting-edge company where you will tackle extraordinary challenges and work with the very best in the industry
Medical Vision Dental 401(k) Commuter Health and Dependent FSA
Unlimited PTO
Industry-leading gender-neutral parental leave
Paid company holidays
Paid sick time
Employee stock purchase program
Disability and life insurance
Employee assistance program
Gym membership reimbursement
Cell phone reimbursement
Numerous company-sponsored events including regular happy hours and team-building events

Required Experience:

Senior IC

Key Skills

Kubernetes
FMEA
Continuous Improvement
Elasticsearch
Go
Root cause Analysis
Maximo
CMMS
Maintenance
Mechanical Engineering
Manufacturing
Troubleshooting

Apply Now

About Company

SentinelOne

A group of defense and intelligence experts saw savvy attackers compromising endpoints seemingly at will. Traditional approaches failed to provide sufficient protection. They founded SentinelOne to develop a dramatic new approach to endpoint protection. It’s one that applies AI and ma ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Senior Site Reliability Engineer

Washington, AR - USA

Job Summary

What Are We Looking For

What Will You Do

What skills and knowledge should you bring

Why us

What Are We Looking For

What Will You Do

What skills and knowledge should you bring

Why us

Key Skills

About Company

Related Jobs