What Are We Looking For
Please note that under Federal & FedRAMP regulations hiring for this role is limited to US citizens only.
FedRamp Staff may be subject to customer or third-party background checks up to and including secret clearance if required by their role at SentinelOne.
We are looking for a Staff Site Reliability Engineer (SRE) to join the Site Reliability Engineering team at SentinelOne. This organizations mission is to keep our uptime promise to our customers by ensuring we meet our SLOs/SLAs help our engineering teams ship software to our customers fast and with quality and ensure our customers are successful. We are looking to add a Staff SRE who has experience running incident post-mortems automating repetitive operational tasks improving alerting accuracy and building and refining processes that reduce downtime. You will work closely with cross-functional teams to lead reliability initiatives and bring best practices to our team.
We value good written communication skills data-driven decisions and a keen eye for continuous improvements. Youll help simplify have a passion for new ideas and know how to execute iteratively toward the final goal. We value candor and collaboration.
What Will You Do
- Lead and execute incident management for production issues ensuring rapid recovery and root cause analysis
- Improve and optimize the observability strategy..
- Collaborate with application engineering teams to design and implement monitoring solutions that enhance our alerting capabilities and reduce noise
- Develop and refine SLOs SLIs and SLAs that align with business objectives and customer expectations
- Conduct post-incident review documenting findings and driving follow-up actions to prevent recurrence.
- Mentor and support other engineers in incident response troubleshooting techniques and reliability best practices.
What skills and knowledge should you bring
- 8 years of experience in Site Reliability Engineering DevOps or a related field in cloud native environments
- Strong expertise in incident management processes and the ability to lead complex troubleshooting efforts under pressure.
- Experience with Kubernetes and container orchestration
- Experience with industry standard observability stacks (Prometheus Grafana ELK OpenTelemetry etc).
- Proficiency in Python and Bash scripting to improve operational workflows and incident response
- Familiarity with modern CI/CD pipelines and DevOps practices
- Excellent communication skills with demonstrated ability to lead and mentor engineers in reliability practices.
Why us
You will work on real-world problems and make an impact by protecting our customers from cyber threats. You will join a cutting-edge project and will be able to influence the architecture design and structure of our core platform. You will tackle extraordinary challenges and work with the very BEST in the industry.
- You will be joining a cutting-edge company where you will tackle extraordinary challenges and work with the very best in the industry
- Medical Vision Dental 401(k) Commuter Health and Dependent FSA
- Unlimited PTO
- Industry-leading gender-neutral parental leave
- Paid company holidays
- Paid sick time
- Employee stock purchase program
- Disability and life insurance
- Employee assistance program
- Gym membership reimbursement
- Cell phone reimbursement
- Numerous company-sponsored events including regular happy hours and team-building events
Required Experience:
Staff IC
What Are We Looking ForPlease note that under Federal & FedRAMP regulations hiring for this role is limited to US citizens only.FedRamp Staff may be subject to customer or third-party background checks up to and including secret clearance if required by their role at SentinelOne.We are looking for a...
What Are We Looking For
Please note that under Federal & FedRAMP regulations hiring for this role is limited to US citizens only.
FedRamp Staff may be subject to customer or third-party background checks up to and including secret clearance if required by their role at SentinelOne.
We are looking for a Staff Site Reliability Engineer (SRE) to join the Site Reliability Engineering team at SentinelOne. This organizations mission is to keep our uptime promise to our customers by ensuring we meet our SLOs/SLAs help our engineering teams ship software to our customers fast and with quality and ensure our customers are successful. We are looking to add a Staff SRE who has experience running incident post-mortems automating repetitive operational tasks improving alerting accuracy and building and refining processes that reduce downtime. You will work closely with cross-functional teams to lead reliability initiatives and bring best practices to our team.
We value good written communication skills data-driven decisions and a keen eye for continuous improvements. Youll help simplify have a passion for new ideas and know how to execute iteratively toward the final goal. We value candor and collaboration.
What Will You Do
- Lead and execute incident management for production issues ensuring rapid recovery and root cause analysis
- Improve and optimize the observability strategy..
- Collaborate with application engineering teams to design and implement monitoring solutions that enhance our alerting capabilities and reduce noise
- Develop and refine SLOs SLIs and SLAs that align with business objectives and customer expectations
- Conduct post-incident review documenting findings and driving follow-up actions to prevent recurrence.
- Mentor and support other engineers in incident response troubleshooting techniques and reliability best practices.
What skills and knowledge should you bring
- 8 years of experience in Site Reliability Engineering DevOps or a related field in cloud native environments
- Strong expertise in incident management processes and the ability to lead complex troubleshooting efforts under pressure.
- Experience with Kubernetes and container orchestration
- Experience with industry standard observability stacks (Prometheus Grafana ELK OpenTelemetry etc).
- Proficiency in Python and Bash scripting to improve operational workflows and incident response
- Familiarity with modern CI/CD pipelines and DevOps practices
- Excellent communication skills with demonstrated ability to lead and mentor engineers in reliability practices.
Why us
You will work on real-world problems and make an impact by protecting our customers from cyber threats. You will join a cutting-edge project and will be able to influence the architecture design and structure of our core platform. You will tackle extraordinary challenges and work with the very BEST in the industry.
- You will be joining a cutting-edge company where you will tackle extraordinary challenges and work with the very best in the industry
- Medical Vision Dental 401(k) Commuter Health and Dependent FSA
- Unlimited PTO
- Industry-leading gender-neutral parental leave
- Paid company holidays
- Paid sick time
- Employee stock purchase program
- Disability and life insurance
- Employee assistance program
- Gym membership reimbursement
- Cell phone reimbursement
- Numerous company-sponsored events including regular happy hours and team-building events
Required Experience:
Staff IC
View more
View less