Senior or Lead SRE

Not Interested
Bookmark
Report This Job

profile Job Location:

Irving, TX - USA

profile Monthly Salary: Not Disclosed
Posted on: Yesterday
Vacancies: 1 Vacancy

Job Summary

Job Title: Senior or Lead SRE
Location: Irving Texas (Locals only)
Work Type: Onsite
Job Type: Contract to Hire
Rate: $75/hr on W2

Notes:

  • Must be able to support PST CST and EST time zones on a rotational basis
  • Must not require any sponsorship now or in the future (US Citizen/Green Card only)
  • Interview: Initial video and potential in-person
  • 5 million in E&O Network Security and Privacy Liability is required
  • LinkedIn needed

Job Description:
Senior or Lead SRE w/Java and SQL programming changes and automation mindset

Candidate will need to submit A short high signal summary. For example:

Years of SRE / production experience

Nature of on call responsibilities

One concrete incident they handled

Examples of Java SQL and infra changes theyve made

Why you believe theyre senior level and production ready

And responds to these questions at time of submission:

1. How comfortable are you to support multiple time zones

2. Walk me through your first steps when responding to a production incident.

3. Can you comfortable bring on production support rotation. Only people with Yes answer should be considered

Position Description:

The Sr./ Lead Site Reliability Engineer designs enhances and operates highly reliable scalable and observable production systems in an Azure-based environment. This role blends software engineering with systems administration to build resilient infrastructure automate operations and improve system performance. The engineer applies strong engineering principles to operational challenges with a focus on reliability automation observability and continuous improvement.

Core responsibilities include engineering led incident response implementing permanent corrective actions reducing operational toil and proactively preventing failures. The role contributes to code fixes owns Dynatrace based observability and delivers custom reliability and operational reporting to improve system health and availability. Participation in a scheduled-on call rotation is required.

Minimum Requirement

4-year Computer Science Information Systems Engineering degree or relevant experience. (Degree university and year must be on the resume)

8 Years of Site reliability experience.

Advanced SRE Leadership Responsibilities:

Provide technical leadership for SRE practices across multiple services or platforms.

Define and evolve reliability standards operational best practices and incident response frameworks.

Influence system architecture and design decisions to ensure scalability resilience and operability.

Serve as a subject matter expert for reliability availability and production risk management.

Act as the lead escalation point for complex and business critical production incidents.

Lead high severity incident response coordinating across engineering platform and security teams.

Drive blameless post incident reviews and ensure corrective actions are prioritized and completed.

Improve call processes escalation models and incident response effectiveness.

Own the strategy and implementation of Dynatrace based observability including dashboards and alerting standards.

Establish and monitor reliability signals (availability latency error rates) across critical systems.

Identify reliability risks and lead mitigation initiatives before customer impact occurs.

Define and maintain leadership level reliability and operational reporting.

Use production data to drive prioritization of reliability investments and operational improvements.

Communicate reliability posture risks and recommendations to senior engineering leadership.

Mentor and guide senior and mid-level SREs and production support engineers.

Support hiring onboarding and technical evaluation of SRE talent.

Collaborate with squad members to define iteration plans and commitments.

Ensure compliance with HIPAA and other security regulations.

Critical Skills:

Strong experience with monitoring and observability tools (Dynatrace experience is a plus).

Hands-on experience with GitHub Actions for CI/CD automation.

Proficiency in Kubernetes and Docker for container orchestration.

Familiarity with Azure cloud services.

Experience with Ansible.

Demonstrated experience in automation of infrastructure and operational processes using scripting or configuration management tools.

Java application changes (Fixing production bugs/ Adding resiliency error handling or safeguards)

SQL / database changes (Schema updates or migrations/Indexing or query optimization/ Rolling changes out safely in production)

Knowledge of SRE principles (SLIs SLOs error budgets).

Automate repetitive operational work using Ansible Python Bash or similar tools

Job Title: Senior or Lead SRE Location: Irving Texas (Locals only) Work Type: Onsite Job Type: Contract to Hire Rate: $75/hr on W2 Notes: Must be able to support PST CST and EST time zones on a rotational basis Must not require any sponsorship now or in the future (US Citizen/Green Card only) ...
View more view more