Site Reliability Engineer

Arlington Heights, WA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

(Local candidates only as F2F Interview is must)

Project Overview

Client is seeking a Site Reliability Engineer to join the Application Recovery Team (ART).

This team provides 24x7x365 monitoring and operational support for Uniteds digital platforms including and mobile applications. The role focuses on proactive monitoring incident response service restoration and maintaining system reliability across production cloud and distributed environments.

This is a high-visibility operations role requiring strong technical troubleshooting skills and the ability to respond effectively in real-time production environments.

Key Responsibilities

Monitor application performance and overall system health across production environments
Respond to alerts generated by enterprise monitoring tools
Analyze and correlate alerts to determine impact and root cause
Restore services quickly during incidents to minimize downtime
Support ITIL-based Change Incident and Problem Management processes
Participate in planned system changes and deployments
Escalate high-impact incidents appropriately
Collaborate cross-functionally with:
- DevOps teams
- Application Support
- Server Operations
- Network Operations
- Middleware teams
- Database teams
- Digital Operations Center
Identify performance abnormalities and system risks
Maintain enterprise security standards
Contribute to automation and scripting improvements
Support continuous uptime and reliability of digital channels

Required Qualifications

1 2 years of experience in an operational support or production support environment
Experience with enterprise application monitoring and APM tools
Knowledge of distributed systems
Intermediate systems administration experience (Linux and Windows)
Understanding of virtualization technologies
Familiarity with ITIL service management practices
Strong written and verbal communication skills
Ability to multitask and prioritize in a fast-paced environment
Strong analytical and troubleshooting capabilities

Technical & Soft Skills

Operating Systems

Unix/Linux (HPUX AIX Solaris Linux)
Windows Server

Cloud & Virtualization

AWS
VMware
Hyper-V

Monitoring Tools (Strong Experience Required)

Dynatrace (highly preferred)
Datadog
AppDynamics
BigPanda
SCOM
LogicMonitor

Middleware (Troubleshooting Required)

WebLogic
WebSphere
DataPower
Messaging technologies

ITSM / Ticketing

ServiceNow (preferred)
ITIL service management processes

Additional Technical Areas

Cloud infrastructure
Virtualization platforms
Middleware
Database systems
Storage and backup technologies

Scripting (Preferred)

Shell scripting
Python
Automation scripting

(Local candidates only as F2F Interview is must) Project Overview Client is seeking a Site Reliability Engineer to join the Application Recovery Team (ART). This team provides 24x7x365 monitoring and operational support for Uniteds digital platforms including and mobile applications. The role focus...

(Local candidates only as F2F Interview is must)

Project Overview

Client is seeking a Site Reliability Engineer to join the Application Recovery Team (ART).

This is a high-visibility operations role requiring strong technical troubleshooting skills and the ability to respond effectively in real-time production environments.

Key Responsibilities

Monitor application performance and overall system health across production environments
Respond to alerts generated by enterprise monitoring tools
Analyze and correlate alerts to determine impact and root cause
Restore services quickly during incidents to minimize downtime
Support ITIL-based Change Incident and Problem Management processes
Participate in planned system changes and deployments
Escalate high-impact incidents appropriately
Collaborate cross-functionally with:
- DevOps teams
- Application Support
- Server Operations
- Network Operations
- Middleware teams
- Database teams
- Digital Operations Center
Identify performance abnormalities and system risks
Maintain enterprise security standards
Contribute to automation and scripting improvements
Support continuous uptime and reliability of digital channels

Required Qualifications

1 2 years of experience in an operational support or production support environment
Experience with enterprise application monitoring and APM tools
Knowledge of distributed systems
Intermediate systems administration experience (Linux and Windows)
Understanding of virtualization technologies
Familiarity with ITIL service management practices
Strong written and verbal communication skills
Ability to multitask and prioritize in a fast-paced environment
Strong analytical and troubleshooting capabilities

Technical & Soft Skills

Operating Systems

Unix/Linux (HPUX AIX Solaris Linux)
Windows Server

Cloud & Virtualization

AWS
VMware
Hyper-V

Monitoring Tools (Strong Experience Required)

Dynatrace (highly preferred)
Datadog
AppDynamics
BigPanda
SCOM
LogicMonitor

Middleware (Troubleshooting Required)

WebLogic
WebSphere
DataPower
Messaging technologies

ITSM / Ticketing

ServiceNow (preferred)
ITIL service management processes

Additional Technical Areas

Cloud infrastructure
Virtualization platforms
Middleware
Database systems
Storage and backup technologies

Scripting (Preferred)

Shell scripting
Python
Automation scripting

Key Skills

Apply Now

About Company

DKMRBH Inc

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Site Reliability Engineer

Arlington Heights, WA - USA

Job Summary

Key Skills

About Company

Related Jobs