Application Operational Support Site Reliability Engineer

Irving, TX - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Role: Application Operational Support / Site Reliability Engineer

Location: Irving TX or Charlotte NC (Hybrid)

Type: Contract

Role Summary

We are seeking a highly skilled Application Operational Support / Site Reliability Engineer to support and operate mission-critical enterprise applications in a highly regulated environment. This role is responsible for ensuring platform reliability availability and operational excellence through strong CI/CD practices observability incident management and customer-facing remediation.
The ideal candidate combines strong technical troubleshooting skills with disciplined operational practices and the ability to work independently with stakeholders

Key Responsibilities

Support production and pre-production environments to ensure high availability performance and stability of enterprise applications.
Support and maintain CI/CD pipelines using tools such as GitHub Actions Harness or similar.
Partner with engineering teams to improve deployment reliability reduce manual steps and enable repeatable releases.
Assist with deployment automation and release coordination across environments.
Execute Incident Change and Problem Management processes using ServiceNow.
Lead or contribute to major incident calls ensuring clear communication coordination and resolution.
Perform root cause analysis and drive permanent fixes through problem management practices.
Monitor application and platform health using tools such as Splunk Grafana AppDynamics or equivalent.
Configure dashboards alerts and monitoring thresholds to proactively identify issues.
Use telemetry data to identify performance bottlenecks and reliability risks.
Partner with application infrastructure and security teams to resolve complex cross-functional issues.
Identify operational gaps and recommend improvements to tooling processes and automation.
Contribute to runbooks operational documentation and standard operating procedures.
Support platform modernization initiatives aligned with reliability and scalability goals.

Required Skills & Experience

Core Skills
5 years of experience in application/platform operations production support or SRE roles.
3 years of experience with CI/CD pipelines (GitHub Actions Harness or similar tools).
Solid understanding of Incident Change and Problem Management processes preferably using ServiceNow.
2 years of experience with observability and monitoring tools such as Splunk Grafana AppDynamics or equivalent.
Excellent troubleshooting and critical thinking skills with the ability to diagnose complex production issues.
Proven experience interacting directly with customers or business stakeholders during operational events.

Technical Competencies

Strong understanding of application deployment runtime environments and system dependencies.
Ability to read logs metrics and traces to identify root causes.
Familiarity with cloud-native or hybrid enterprise environments.

Nice-to-Have Skills:

Experience with VM image creation/build processes.
Exposure to OpenShift / OCP or Kubernetes-based platforms.
Experience operating in regulated environments (banking financial services).

Role: Application Operational Support / Site Reliability Engineer Location: Irving TX or Charlotte NC (Hybrid) Type: Contract Role Summary We are seeking a highly skilled Application Operational Support / Site Reliability Engineer to support and operate mission-critical enterprise applications in ...