Role: Application Operational Support / Site Reliability Engineer
Location: Irving TX or Charlotte NC (Hybrid)
Type: Contract
Role Summary
- We are seeking a highly skilled Application Operational Support / Site Reliability Engineer to support and operate mission-critical enterprise applications in a highly regulated environment. This role is responsible for ensuring platform reliability availability and operational excellence through strong CI/CD practices observability incident management and customer-facing remediation.
- The ideal candidate combines strong technical troubleshooting skills with disciplined operational practices and the ability to work independently with stakeholders
Key Responsibilities
- Support production and pre-production environments to ensure high availability performance and stability of enterprise applications.
- Support and maintain CI/CD pipelines using tools such as GitHub Actions Harness or similar.
- Partner with engineering teams to improve deployment reliability reduce manual steps and enable repeatable releases.
- Assist with deployment automation and release coordination across environments.
- Execute Incident Change and Problem Management processes using ServiceNow.
- Lead or contribute to major incident calls ensuring clear communication coordination and resolution.
- Perform root cause analysis and drive permanent fixes through problem management practices.
- Monitor application and platform health using tools such as Splunk Grafana AppDynamics or equivalent.
- Configure dashboards alerts and monitoring thresholds to proactively identify issues.
- Use telemetry data to identify performance bottlenecks and reliability risks.
- Partner with application infrastructure and security teams to resolve complex cross-functional issues.
- Identify operational gaps and recommend improvements to tooling processes and automation.
- Contribute to runbooks operational documentation and standard operating procedures.
- Support platform modernization initiatives aligned with reliability and scalability goals.
Required Skills & Experience
- Core Skills
- 5 years of experience in application/platform operations production support or SRE roles.
- 3 years of experience with CI/CD pipelines (GitHub Actions Harness or similar tools).
- Solid understanding of Incident Change and Problem Management processes preferably using ServiceNow.
- 2 years of experience with observability and monitoring tools such as Splunk Grafana AppDynamics or equivalent.
- Excellent troubleshooting and critical thinking skills with the ability to diagnose complex production issues.
- Proven experience interacting directly with customers or business stakeholders during operational events.
Technical Competencies
- Strong understanding of application deployment runtime environments and system dependencies.
- Ability to read logs metrics and traces to identify root causes.
- Familiarity with cloud-native or hybrid enterprise environments.
Nice-to-Have Skills:
- Experience with VM image creation/build processes.
- Exposure to OpenShift / OCP or Kubernetes-based platforms.
- Experience operating in regulated environments (banking financial services).
Role: Application Operational Support / Site Reliability Engineer Location: Irving TX or Charlotte NC (Hybrid) Type: Contract Role Summary We are seeking a highly skilled Application Operational Support / Site Reliability Engineer to support and operate mission-critical enterprise applications in ...
Role: Application Operational Support / Site Reliability Engineer
Location: Irving TX or Charlotte NC (Hybrid)
Type: Contract
Role Summary
- We are seeking a highly skilled Application Operational Support / Site Reliability Engineer to support and operate mission-critical enterprise applications in a highly regulated environment. This role is responsible for ensuring platform reliability availability and operational excellence through strong CI/CD practices observability incident management and customer-facing remediation.
- The ideal candidate combines strong technical troubleshooting skills with disciplined operational practices and the ability to work independently with stakeholders
Key Responsibilities
- Support production and pre-production environments to ensure high availability performance and stability of enterprise applications.
- Support and maintain CI/CD pipelines using tools such as GitHub Actions Harness or similar.
- Partner with engineering teams to improve deployment reliability reduce manual steps and enable repeatable releases.
- Assist with deployment automation and release coordination across environments.
- Execute Incident Change and Problem Management processes using ServiceNow.
- Lead or contribute to major incident calls ensuring clear communication coordination and resolution.
- Perform root cause analysis and drive permanent fixes through problem management practices.
- Monitor application and platform health using tools such as Splunk Grafana AppDynamics or equivalent.
- Configure dashboards alerts and monitoring thresholds to proactively identify issues.
- Use telemetry data to identify performance bottlenecks and reliability risks.
- Partner with application infrastructure and security teams to resolve complex cross-functional issues.
- Identify operational gaps and recommend improvements to tooling processes and automation.
- Contribute to runbooks operational documentation and standard operating procedures.
- Support platform modernization initiatives aligned with reliability and scalability goals.
Required Skills & Experience
- Core Skills
- 5 years of experience in application/platform operations production support or SRE roles.
- 3 years of experience with CI/CD pipelines (GitHub Actions Harness or similar tools).
- Solid understanding of Incident Change and Problem Management processes preferably using ServiceNow.
- 2 years of experience with observability and monitoring tools such as Splunk Grafana AppDynamics or equivalent.
- Excellent troubleshooting and critical thinking skills with the ability to diagnose complex production issues.
- Proven experience interacting directly with customers or business stakeholders during operational events.
Technical Competencies
- Strong understanding of application deployment runtime environments and system dependencies.
- Ability to read logs metrics and traces to identify root causes.
- Familiarity with cloud-native or hybrid enterprise environments.
Nice-to-Have Skills:
- Experience with VM image creation/build processes.
- Exposure to OpenShift / OCP or Kubernetes-based platforms.
- Experience operating in regulated environments (banking financial services).
View more
View less