Responsibilities include:
- Incident Management -Create and manage necessary process involving incidents
- Partner with Ops Control to ensure IT and/or End User communications are handled appropriately
- Engage with the development team throughout the life cycle to support Application build for Reliability
- Develop software to automate manual operational work
- Run maintain and improve the service against established Service Level Objectives by applying software engineering principles
- Responsible for the availability performance change (CP) management monitoring and capacity management of their services
- Troubleshoot priority incidents conduct blameless post-mortems and ensure permanent closure of the incidents
- Analyze patterns of production incidents develop permanent remediation plans and implement automation to prevent future incidents from occurring through software engineering
- Manage process related functions around large-scale events such as disaster recovery. Communicate closely with impacted groups to ensure all events are properly managed.
Primary Skills / Must have
- Site Reliability Engineer (SRE) in which 80% will be support React/Protect 10% will be in Dev OpsEnable space.
- Proven track record supporting large scale multi-tiered cloud-based applications.
- Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns
- Hands on experience with Java Angular Spring DB2 Unix scripting and experienced in scheduler tools such as TWS autosys
- L2-L3 Production Support Debugging skills problem solving
- Experience working in an Agile Development environment
- Proven ability to understand and troubleshoot complex problems under pressure
- Excellent communication skills (both written and oral) listening skills influencing and negotiation skills
- Experience with performance troubleshooting and remediation
- Experience with observability tools such as Splunk Kibana Grafana Prometheus
- Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating and lead in DevOps automation and best practices.
Secondary Skills / Desired skills
- Having good expertise on Linux and shell scripting. Need to be very comfortable with Linux
- Grafana/Kibana dashboarding experience
- Good problem-solving skills
- Good communicator
- Good understanding of brokerage business
- Jobs (controlM/CBSS/CRON) experience
- Bachelors/Masters Degree in Computer Science Information Systems or related field
Responsibilities include: Incident Management -Create and manage necessary process involving incidents Partner with Ops Control to ensure IT and/or End User communications are handled appropriately Engage with the development team throughout the life cycle to support Application build for Reliabili...
Responsibilities include:
- Incident Management -Create and manage necessary process involving incidents
- Partner with Ops Control to ensure IT and/or End User communications are handled appropriately
- Engage with the development team throughout the life cycle to support Application build for Reliability
- Develop software to automate manual operational work
- Run maintain and improve the service against established Service Level Objectives by applying software engineering principles
- Responsible for the availability performance change (CP) management monitoring and capacity management of their services
- Troubleshoot priority incidents conduct blameless post-mortems and ensure permanent closure of the incidents
- Analyze patterns of production incidents develop permanent remediation plans and implement automation to prevent future incidents from occurring through software engineering
- Manage process related functions around large-scale events such as disaster recovery. Communicate closely with impacted groups to ensure all events are properly managed.
Primary Skills / Must have
- Site Reliability Engineer (SRE) in which 80% will be support React/Protect 10% will be in Dev OpsEnable space.
- Proven track record supporting large scale multi-tiered cloud-based applications.
- Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns
- Hands on experience with Java Angular Spring DB2 Unix scripting and experienced in scheduler tools such as TWS autosys
- L2-L3 Production Support Debugging skills problem solving
- Experience working in an Agile Development environment
- Proven ability to understand and troubleshoot complex problems under pressure
- Excellent communication skills (both written and oral) listening skills influencing and negotiation skills
- Experience with performance troubleshooting and remediation
- Experience with observability tools such as Splunk Kibana Grafana Prometheus
- Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating and lead in DevOps automation and best practices.
Secondary Skills / Desired skills
- Having good expertise on Linux and shell scripting. Need to be very comfortable with Linux
- Grafana/Kibana dashboarding experience
- Good problem-solving skills
- Good communicator
- Good understanding of brokerage business
- Jobs (controlM/CBSS/CRON) experience
- Bachelors/Masters Degree in Computer Science Information Systems or related field
View more
View less