(Local candidates only as F2F Interview is must) Project Overview
Client is seeking a Site Reliability Engineer to join the Application Recovery Team (ART).
This team provides 24x7x365 monitoring and operational support for Uniteds digital platforms including and mobile applications. The role focuses on proactive monitoring incident response service restoration and maintaining system reliability across production cloud and distributed environments.
This is a high-visibility operations role requiring strong technical troubleshooting skills and the ability to respond effectively in real-time production environments.
Key Responsibilities
- Monitor application performance and overall system health across production environments
- Respond to alerts generated by enterprise monitoring tools
- Analyze and correlate alerts to determine impact and root cause
- Restore services quickly during incidents to minimize downtime
- Support ITIL-based Change Incident and Problem Management processes
- Participate in planned system changes and deployments
- Escalate high-impact incidents appropriately
- Collaborate cross-functionally with:
- DevOps teams
- Application Support
- Server Operations
- Network Operations
- Middleware teams
- Database teams
- Digital Operations Center
- Identify performance abnormalities and system risks
- Maintain enterprise security standards
- Contribute to automation and scripting improvements
- Support continuous uptime and reliability of digital channels
Required Qualifications
- 1 2 years of experience in an operational support or production support environment
- Experience with enterprise application monitoring and APM tools
- Knowledge of distributed systems
- Intermediate systems administration experience (Linux and Windows)
- Understanding of virtualization technologies
- Familiarity with ITIL service management practices
- Strong written and verbal communication skills
- Ability to multitask and prioritize in a fast-paced environment
- Strong analytical and troubleshooting capabilities
Technical & Soft Skills
Operating Systems
- Unix/Linux (HPUX AIX Solaris Linux)
- Windows Server
Cloud & Virtualization
Monitoring Tools (Strong Experience Required)
- Dynatrace (highly preferred)
- Datadog
- AppDynamics
- BigPanda
- SCOM
- LogicMonitor
Middleware (Troubleshooting Required)
- WebLogic
- WebSphere
- DataPower
- Messaging technologies
ITSM / Ticketing
- ServiceNow (preferred)
- ITIL service management processes
Additional Technical Areas
- Cloud infrastructure
- Virtualization platforms
- Middleware
- Database systems
- Storage and backup technologies
Scripting (Preferred)
- Shell scripting
- Python
- Automation scripting
(Local candidates only as F2F Interview is must) Project Overview Client is seeking a Site Reliability Engineer to join the Application Recovery Team (ART). This team provides 24x7x365 monitoring and operational support for Uniteds digital platforms including and mobile applications. The role focus...
(Local candidates only as F2F Interview is must) Project Overview
Client is seeking a Site Reliability Engineer to join the Application Recovery Team (ART).
This team provides 24x7x365 monitoring and operational support for Uniteds digital platforms including and mobile applications. The role focuses on proactive monitoring incident response service restoration and maintaining system reliability across production cloud and distributed environments.
This is a high-visibility operations role requiring strong technical troubleshooting skills and the ability to respond effectively in real-time production environments.
Key Responsibilities
- Monitor application performance and overall system health across production environments
- Respond to alerts generated by enterprise monitoring tools
- Analyze and correlate alerts to determine impact and root cause
- Restore services quickly during incidents to minimize downtime
- Support ITIL-based Change Incident and Problem Management processes
- Participate in planned system changes and deployments
- Escalate high-impact incidents appropriately
- Collaborate cross-functionally with:
- DevOps teams
- Application Support
- Server Operations
- Network Operations
- Middleware teams
- Database teams
- Digital Operations Center
- Identify performance abnormalities and system risks
- Maintain enterprise security standards
- Contribute to automation and scripting improvements
- Support continuous uptime and reliability of digital channels
Required Qualifications
- 1 2 years of experience in an operational support or production support environment
- Experience with enterprise application monitoring and APM tools
- Knowledge of distributed systems
- Intermediate systems administration experience (Linux and Windows)
- Understanding of virtualization technologies
- Familiarity with ITIL service management practices
- Strong written and verbal communication skills
- Ability to multitask and prioritize in a fast-paced environment
- Strong analytical and troubleshooting capabilities
Technical & Soft Skills
Operating Systems
- Unix/Linux (HPUX AIX Solaris Linux)
- Windows Server
Cloud & Virtualization
Monitoring Tools (Strong Experience Required)
- Dynatrace (highly preferred)
- Datadog
- AppDynamics
- BigPanda
- SCOM
- LogicMonitor
Middleware (Troubleshooting Required)
- WebLogic
- WebSphere
- DataPower
- Messaging technologies
ITSM / Ticketing
- ServiceNow (preferred)
- ITIL service management processes
Additional Technical Areas
- Cloud infrastructure
- Virtualization platforms
- Middleware
- Database systems
- Storage and backup technologies
Scripting (Preferred)
- Shell scripting
- Python
- Automation scripting
View more
View less