drjobs Software Engineer - Incident Management

Software Engineer - Incident Management

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

New York City, NY - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Were on a mission to build the best platform in the world for engineers to understand and scale their systems applications and teams. We operate at high scaletrillions of data points per dayproviding alwayson alerting metrics visualization logs and application tracing for tens of thousands of companies. Our engineering culture values pragmatism honesty and simplicity to solve hard problems the right way

The Incident Management SRE team at Datadog fosters a resilient culture by using incidents as learning opportunities and catalysts for growth. We collaborate closely with teams across departments to enhance oncall experience incident response and postincident analysis reducing friction and optimizing tooling and processes. Our efforts empower Datadog to navigate unexpected failures confidently efficiently and with a commitment to continuous learning and systems improvement.

At Datadog we place value in our office culture the relationships and collaboration it builds and the creativity it brings to the table. We operate as a hybrid workplace to ensure our Datadogs can create a worklife harmony that best fits them.

What Youll Do:

  • Steer the oncall experience for the company by establishing best practices and building platforms to support oncall rotations and compensation.
  • Define how we respond to incidents and write software to streamline the process collaborating with product teams as needed. Our aim is to fully support our incident responders in dealing with complexity.
  • Contribute to the postmortem process for the company collaborating with teams on writing them and identifying opportunities to reduce friction and enhance learning value for the organization. Our team also runs a weekly postmortem reading group.
  • Support various teams in facilitating incident reviews that emphasize learning and blamelessness. Help them share their learnings across the organization to improve the resilience of our people.
  • Train our oncallers in incident and postmortem processes involving both introducing newcomers to oncall responsibilities and refreshing the knowledge of existing engineers.
  • Engage in crossfunctional collaborations with different teams across the organization embedding in their group for a few weeks to either learn about how work is performed or help them improve oncall practices.

Who You Are:

  • At least 3 years of experience building software that solves real user problems designing new features with RFCs as well as reviewing others code and documents collaboratively. We develop in Go and Python and a bit of TypeScript.
  • Familiarity with Kubernetes and distributed systems along with an understanding of their potential failure scenarios.
  • Interest in analyzing incidents identifying broader risk patterns and effectively sharing findings for others to understand and learn from.
  • Experience being oncall and responding to incidents iteratively improving incident response processes.
  • Empathy collaboration and communication skills in English to cultivate strong relationships across various teams in the organization
  • Willingness to teach and train other engineers on best practices. Experience driving crossfunctional change and leading through influence or a strong interest in doing so.

Datadog values people from all walks of life. We understand not everyone will meet all the above qualifications on day one. Thats okay. If youre passionate about technology and want to grow your skills we encourage you to apply.

Benefits and Growth:

  • New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
  • Continuous professional development product training and career pathing
  • Intradepartmental mentor and buddy program for inhouse networking
  • An inclusive company culture ability to join our Community Guilds (Datadog employee resource groups)
  • Access to Inclusion Talks our Internal panel discussions
  • Free global mental health benefits for employees and dependents age 6
  • Competitive global benefits

Benefits and Growth listed above may vary based on the country of your employment and the nature of your employment with Datadog

Employment Type

Full Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.