Site Reliability Engineer II Epic
Secaucus, NJ - USA
Job Summary
As a Site Reliability Engineer II-Epic your role is to provide reliability engineering services through observability and performance engineering monitoring and performance tools to deliver detailed feedback to product owners and development will partner with Product Owners to define service level objectives and develop service level with cross-functional teams to design build automate and maintain scalable infrastructure. Your responsibilities will include ensuring high availability monitoring system performance and aiding support staff with resolving incidents. This role requires a strong background in scripting cloud platforms and a passion for optimizing operational efficiency. You will use Site Reliability Engineering practices to deliver a seamless user experience.
Pay Range: $00 plus yearly bonus
Salary offers are based on a wide range of factors including relevant skills training experience education and where applicable certifications obtained. Market and organizational factors are also considered. Successful candidates may be eligible to receive annual performance bonus compensation.
Remote: This position supporting Epic can be remote if not located near a hub within certain criteria.
This position is hybrid and will require 3 days on site at one of the following Quest sites: Secaucus NJ or Schaumburg IL.
Benefits Information: We are proud to offer best-in-class benefits and programs to support employees and their families in living healthy happy lives. Our pay and benefit plans have been designed to promote employee health in all respects physical financial and developmental. Depending on whether it is a part-time or full-time position some of the benefits offered may include:
- Day 1 Medical supplemental health dental & vision for FT employees who work 30 hours
- Best-in-class well-being programs
- Annual no-cost health assessment program
- Blueprint for Wellness
- healthyMINDS mental health program
- Vacation and Health/Flex Time
- 6 Holidays plus 1 MyDay off
- FinFit financial coaching and services
- 401(k) pre-tax and/or Roth IRA with company match up to 5% after 12 months of service
- Employee stock purchase plan
- Life and disability insurance plus buy-up option
- Flexible Spending Accounts Annual incentive plans
- Matching gifts program
- Education assistance through MyQuest for Education Career advancement opportunities and so much more!
Responsibilities
Responsibilities:
System Monitoring and Analysis:
- Implement and maintain robust observability solutions to monitor system performance identifying bottlenecks and ensuring optimal operation.
- Utilize tools to gather analyze and visualize key performance metrics.
Performance Optimization:
- Proactively identify and address performance bottlenecks through in-depth analysis and optimization strategies.
- Work closely with development teams to implement performance improvements and enhance overall system efficiency.
Capacity Planning:
- Conduct capacity planning exercises based on observed patterns and future growth projections.
- Collaborate with infrastructure and development teams to ensure adequate resources are available to meet system demands.
Automation and Scripting:
- Develop and maintain automation scripts for routine tasks enabling efficient monitoring and response procedures.
- Implement automated processes for scaling and provisioning resources based on observed workload patterns.
Documentation:
- Document system architecture configurations and observability best practices to facilitate knowledge transfer and onboarding for team members.
- Keep documentation up-to-date to reflect changes in the system and its monitoring setup.
Collaboration with Development Teams:
- Work closely with software engineers to integrate observability tools into the development lifecycle.
- Provide guidance on building observable systems and assist in instrumenting applications for effective monitoring.
Continuous Improvement:
- Stay informed about industry best practices and emerging technologies related to observability and performance engineering.
- Drive continuous improvement initiatives to enhance the reliability and performance of systems.
- Security and Compliance:
- Collaborate with security teams to implement monitoring and observability measures that align with security requirements and compliance standards.
- Participate in security incident response activities and contribute to ongoing security assessments.
- Training and Knowledge Sharing:
- Conduct training sessions for team members and other stakeholders on observability tools best practices and performance engineering concepts.
- Foster a culture of knowledge sharing within the organization.
And other duties as assigned.
Qualifications
Required WorkExperience:
- 4 plus years of experience with multiple APM tools and extensive experience with Dynatrace
- 3 plus years SRE experience
- Experience in software development infrastructure or operations roles
- Certifications in relevant technologies (e.g. AWS DevOps Kubernetes Dynatrace Azure etc.)
- Working experience building CI/CD pipelines and version control systems
- Working experience with scripting languages (e.g. Python Bash Go etc.)
- Excellent problem-solving and communication skills.
- Ability to work collaboratively in a fast-paced agile environment.
Preferred Work Experience:
- Working experience with Neoload Jmeter or equivalent performance testing tool.
- Experience executing software load and performance testing in an enterprise environment.
- Experience testing applications hosted in the cloud.
- Experience with infrastructure as code tools such as Terraform or CloudFormation.
- Deep understanding of Linux systems administration and networking principles.
- Experience with containerization and orchestration technologies such as Docker and Kubernetes.
- Experience or familiarity with IIS HTML Java Jboss.
- Experience in Chaos Engineering
- Programming experience C C++ Java or other popular programming languages. Perl/Python/JavaScript scripting experience may be considered equivalent.
- Terraform and Ansible experience.
- Exposure to Splunk tools.
- Exposure to microservices.
- Dynatrace Certifications
- AWS/Azure/GCP Certifications
- Chaos Engineering Certifications
- Agile Certifications
Physical and Mental Requirements:
- Ability to sit/stand for long periods of time.
- Ability to handle high stress situations.
- Ability to lift up to 50 lbs.
Knowledge:
- Site Reliability Engineering Principles
- DevSecOps Principles
- Agile (SAFe)
- Healthcare industry
- ITLT
- ServiceNow
- Jira/Confluence
Skills:
- Dynatrace/Prometheus/Grafana
- Neoload/Jmeter
- Splunk
- AWS/Azure/GCP
- SAFe Agile
- Strong communication skills (written/verbal)
- Time management
- Analytic problem solver
- Self-starter
- Result oriented and proven ability in organizing priorities
Education
- Bachelors Degree Bachelors degree in Computer Science Engineering or a related field (Required)
Licenses and Certifications
- Agile Certification (Project Management) (Preferred)
Required Experience:
IC
About Company
Quest Diagnostics (NYSE: DGX) empowers people to take action to improve health outcomes. Derived from the world's largest database of clinical lab results, our diagnostic insights reveal new avenues to identify and treat disease, inspire healthy behaviors and improve health care mana ... View more