drjobs Site Reliability Engineer III

Site Reliability Engineer III

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Addison, TX - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Overview

Concentra is recognized as the nations leading occupational health care company.

With more than 40 years of experience Concentra is dedicated to our mission to improve the health of Americas workforce one patient at a time. With a wide range of services and proactive approaches to care Concentra colleagues provide exceptional service to employers and exceptional care to their employees.

The Site Reliability Engineer (SRE) III is responsible for ensuring that the underlying infrastructure and critical systems are working as expected and running smoothly. They also monitor critical applications services and infrastructure to minimize downtime and ensure their availability. The Senior SRE I plays a large part in improving core system stability and successfully implementing DevOps practices. The Senior Site Reliability Engineer I applies engineering principles to operations focusing on system reliability scalability and performance. This role balances the need for rapid feature development with the imperative of maintaining system stability and availability. The role emphasizes observability proactive reliability engineering continuous improvement and collaboration across teams.

Responsibilities

  • Lead incident response efforts and conduct blameless postmortems to identify root causes and drive systemic improvements.
  • Define monitor and report on service-level indicators (SLIs) objectives (SLOs) and agreements (SLAs).
  • Evolve the architecture to support future requirements based on SLIs SLOs and SLAs.
  • Identify and eliminate toil by automating repetitive operational tasks thus increasing velocity and reliability.
  • Ensure management awareness of problems that are severe in nature or that are exceeding documented targets.
  • Ensure that all problems are resolved in a timely and efficient manner.
  • Own development of software to automate processes like analyzing logs testing production environments and responding to any issues.
  • Develop software tasks in accordance with standards and methodologies.
  • Possess deep knowledge of the entire technology stack.
  • Participate in capacity planning performance analysis and system tuning to ensure scalability and resilience.
  • Collaborate with development teams to ensure reliability is considered during design and implementation phases.
  • Mentor others to accelerate their career growth and encourage participation.
  • Provide technical mentoring to junior SREs.
  • Help build team spirit by assisting other staff members and promoting a positive workplace.
  • Challenge team processes looking for ways to improve them.
  • Recognize potential areas where policies and procedures require change or where new ones need to be developed especially regarding future business expansion. Submit recommendations as appropriate.
  • Ensure all changes comply with change management policies and procedures.
  • Embody the philosophy of DevOps & Sire Reliability Engineering by providing a prescriptive way of measuring and achieving reliability through engineering and operations work.
  • Monitor and report on any security violations related to the unwarranted access to corporate data.
  • Review outstanding issues daily to assure that troubleshooting and resolutions are current.
  • Cross-functional collaboration with application engineering QA and infrastructure teams to ensure observability and reliability.
  • Perform tool evaluation and selection in support of observability and automation

Qualifications

  • Education Level: Bachelors Degree
  • Preferred experience includes AWS or Azure certifications.
  • Experience in lieu of required education is acceptable
  • 7 years of total work experience in IT software engineering or infrastructure roles.
  • Minimum of 5 years of hands-on experience in Sire Reliability Engineering DevOps or closely related roles.
  • At least 3 years of direct experience with AWS and/or Azure including infrastructure provisioning automation and monitoring.
  • Experience with implementing managing and using observability tools data visualization and application monitoring platforms such as Dynatrace AWS CloudWatch Azure Monitor Grafana Prometheus or Datadog.
  • Familiarity with error budgets and their role in balancing reliability and innovation.
  • Direct experience building launching configuring and maintaining AWS and/or Microsoft Azure cloud resources.
    Expertise preferred in implementing methodologies for Automation Continuous Integration Continuous Delivery High Availability High Scalability Monitoring Logging Security and Governance
  • Experience with Terraform and a strong understanding of Infrastructure as Code (IaC) principles.
  • Strong scripting knowledge using languages such as PowerShell Bash Python Groovy etc.
  • Proficiency in at least one programming language preferred e.g. Python Java .
  • Proficient in Git for version control and collaborative development.
  • Experience with GitLab or similar platforms for source code management and CI/CD.
  • Familiarity with Atlassian tools (Jira Confluence) is a plus.
  • Proficient in administering Linux and/or Windows-based platforms.
  • Experience supporting production enterprise applications.
  • Strong understanding of complex multi-tiered environments and their integration with DevOps toolsets.
  • Experience in problem management preventive maintenance and analytical and conceptual problem solving.
  • Experience in business process improvement is also desired.

Job-Related Skills/Competencies

  • Ability to effectively multi-task and adapt to changing business priorities
  • Excellent attention to detail
  • Willingness to learn new technologies
  • Excellent analytical and problem-solving skills
  • Excellent time management and organizational skills
  • Proven drive towards continual improvement
  • Strong interpersonal and communication skills
  • Strong dedication to quality customer service
  • Must possess a personal sense of urgency
  • Strong analytical mindset for risk assessment and mitigation.
  • Ability to quantify reliability and communicate trade-offs.
  • Ability to assess and mitigate risks to system reliability through proactive engineering.
  • Skilled in quantifying reliability metrics and communicating their impact to stakeholders

Additional Data

Employee Benefits

  • 401(k) Retirement Plan with Employer Match
  • Medical Vision Prescription Telehealth & Dental Plans
  • Life & Disability Insurance
  • Paid Time Off
  • Colleague Referral Bonus Program
  • Tuition Reimbursement
  • Commuter Benefits
  • Dependent Care Spending Account
  • Employee Discounts

We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process to perform essential job functions and to receive other benefits and privileges of employment. Please contact us to request accommodation if required.

*This job requires access to confidential and sensitive information requiring ongoing discretion and secure information management*

Concentra is an Equal Opportunity Employerincluding disability/veterans

Employment Type

Unclear

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.