drjobs ALTS - Lead SRE

ALTS - Lead SRE

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Jersey City - USA

Monthly Salary drjobs

$ 152000 - 215000

Vacancy

1 Vacancy

Job Description

Description

We are seeking an experienced Lead Site Reliability Engineer (SRE) to manage and guide our team. The ideal candidate will have a strong foundation in SRE DevOps or infrastructure engineering with leadership skills and the ability to drive team success in a fast-paced dynamic environment. This role involves overseeing the teams execution risk management and strategic initiatives while fostering a collaborative and innovative culture.

Key Responsibilities:

Team Leadership and Management:

  • Lead mentor and develop a team of SREs fostering a culture of collaboration and continuous improvement
  • Set clear goals and expectations for the team ensuring alignment with business objectives.
  • Facilitate regular team meetings and one-on-one sessions to support individual growth and team cohesion

Execution and Delivery:

  • Oversee the delivery of major themes of work ensuring high-quality execution and timely completion
  • Guide the team in estimating delivery timelines and managing workloads effectively
  • Provide expert guidance in debugging and systems design encouraging innovative solutions and trade-off analysis

Risk Management:

  • Assess cross-impact of team deliverables and ensure proactive communication of potential risks
  • Support the team in identifying technical limitations and suggesting remediation strategies

Strategic Vision and Forward Thinking:

  • Develop and implement strategic plans for building robust systems with strong contracts anticipating future changes
  • Encourage the team to propose alternative requirements and solutions that better meet organizational needs
  • Set and prioritize the strategic book of work for the team in line to support goals of the business

Communication and Stakeholder Engagement:

  • Communicate effectively with stakeholders providing updates on progress and raising risks that will impact delivery
  • Ensure the team is aligned with the business vision and understands the importance of their contributions to the product

Qualifications:

  • Experience directly leading or functioning as a lead of technical teams with a focus on SRE DevOps or infrastructure engineering
  • Proficiency in programming languages (Python preferred) and distributed systems (Kubernetes Kafka Cassandra etc.)
  • Experience with setting up and using SLOs to track system health and performance
  • Excellent problem-solving skills and creativity in debugging complex issues
  • Deep understanding of cloud fundamentals and infrastructure management
  • Exceptional communication skills with the ability to articulate technical problems and solutions to diverse audiences
  • A strategic mindset with a keen interest in automation and learning
  • Having a thorough understanding of the full stack of the system

Am example of a Task/Problem to be tackled is below. Does leading a team solving system wide problems excite you

Our system has been working properly for the past few days in our UAT environment. We deployed a new version of core infrastructure that was tested in dev we found it to be working & then approved it for UAT release. Suddenly one of our services is not starting & our product or QA team cannot test changes in this environment. We receive a ping/bug report that provides high level information about what is happening what the user would like to happen & perhaps information about what they expect to happen. We ask you to take a look at the issue.. Resolving this involves:

  • Asking & communicating with the user to fully understand what the issue is
  • Understanding where in the stack to begin debugging
  • Constantly questioning your assumptions about the way the system should work
  • Being able to ask the right questions to your peers & team to triage an issue
  • Providing updates to stakeholders that are counting on you to identify or fix the problem
  • Using your technical skill set to identify/reproduce the issue
  • Communicating what you have found to the team so that we can best resolve the issue


Employment Type

Full-Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.