Site Reliability Engineer

DKMRBH Inc

Not Interested
Bookmark
Report This Job

profile Job Location:

Arlington Heights, WA - USA

profile Monthly Salary: Not Disclosed
Posted on: 4 hours ago
Vacancies: 1 Vacancy

Job Summary

(Local candidates only as F2F Interview is must)

Project Overview

Client is seeking a Site Reliability Engineer to join the Application Recovery Team (ART).

This team provides 24x7x365 monitoring and operational support for Uniteds digital platforms including and mobile applications. The role focuses on proactive monitoring incident response service restoration and maintaining system reliability across production cloud and distributed environments.

This is a high-visibility operations role requiring strong technical troubleshooting skills and the ability to respond effectively in real-time production environments.

Key Responsibilities

  • Monitor application performance and overall system health across production environments
  • Respond to alerts generated by enterprise monitoring tools
  • Analyze and correlate alerts to determine impact and root cause
  • Restore services quickly during incidents to minimize downtime
  • Support ITIL-based Change Incident and Problem Management processes
  • Participate in planned system changes and deployments
  • Escalate high-impact incidents appropriately
  • Collaborate cross-functionally with:
    • DevOps teams
    • Application Support
    • Server Operations
    • Network Operations
    • Middleware teams
    • Database teams
    • Digital Operations Center
  • Identify performance abnormalities and system risks
  • Maintain enterprise security standards
  • Contribute to automation and scripting improvements
  • Support continuous uptime and reliability of digital channels

Required Qualifications

  • 1 2 years of experience in an operational support or production support environment
  • Experience with enterprise application monitoring and APM tools
  • Knowledge of distributed systems
  • Intermediate systems administration experience (Linux and Windows)
  • Understanding of virtualization technologies
  • Familiarity with ITIL service management practices
  • Strong written and verbal communication skills
  • Ability to multitask and prioritize in a fast-paced environment
  • Strong analytical and troubleshooting capabilities

Technical & Soft Skills

Operating Systems

  • Unix/Linux (HPUX AIX Solaris Linux)
  • Windows Server

Cloud & Virtualization

  • AWS
  • VMware
  • Hyper-V

Monitoring Tools (Strong Experience Required)

  • Dynatrace (highly preferred)
  • Datadog
  • AppDynamics
  • BigPanda
  • SCOM
  • LogicMonitor

Middleware (Troubleshooting Required)

  • WebLogic
  • WebSphere
  • DataPower
  • Messaging technologies

ITSM / Ticketing

  • ServiceNow (preferred)
  • ITIL service management processes

Additional Technical Areas

  • Cloud infrastructure
  • Virtualization platforms
  • Middleware
  • Database systems
  • Storage and backup technologies

Scripting (Preferred)

  • Shell scripting
  • Python
  • Automation scripting
(Local candidates only as F2F Interview is must) Project Overview Client is seeking a Site Reliability Engineer to join the Application Recovery Team (ART). This team provides 24x7x365 monitoring and operational support for Uniteds digital platforms including and mobile applications. The role focus...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting