Support Engineer – Incident Management, Automation & Monitoring in Financial Services

Pune - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Job Summary

Synechron is seeking a skilled Production Support Engineer with expertise in automation incident management and Site Reliability Engineering (SRE) practices. The role involves supporting critical banking and financial applications automating operational processes and ensuring high system availability and reliability. You will work closely with cross-functional teams to troubleshoot resolve incidents and implement automation solutions that enhance operational efficiency. Your contributions will support business continuity and operational excellence in a regulated environment.

Software Requirements

Required:

Proficiency in Python scripting (3 years) for automating operational tasks and incident response
Experience with monitoring and observability tools such as Jira ServiceNow and TWS (7 years)
Strong troubleshooting and problem-solving skills for complex distributed systems
Experience handling production outages managing incident escalation and conducting post-incident reviews
Knowledge of stakeholder communication and documentation practices in support environments

Preferred:

Experience with automation frameworks or tools such as Ansible or Shell scripting
Familiarity with cloud platforms like AWS or Azure (desired)
Exposure to DevOps practices and tools for continuous improvement

Overall Responsibilities

Lead and manage high-priority incidents (P1/P2) to ensure rapid resolution and minimal impact on business operations
Conduct root cause analysis and implement permanent fixes to reduce recurring issues
Develop and maintain automation scripts in Python to streamline incident management and support processes
Collaborate with support infrastructure and application teams to improve system reliability and monitoring strategies
Support platform health by configuring and maintaining alerting logging and observability tools
Lead efforts to reduce Mean Time to Resolution (MTTR) and Mean Time to Detection (MTTD) through automation and process improvements
Document procedures solutions and best practices for incident response and operational workflows
Participate in on-call rotations supporting 24/7 system availability and incident escalation
Assist with platform monitoring capacity planning disaster recovery and performance tuning
Drive continuous improvement initiatives to enhance operational efficiency toolsets and incident management workflows

Technical Skills (By Category)

Programming & Automation (Essential):

Python (script development automation support tasks)

Preferred:

Shell scripting or PowerShell for operational scripting

Monitoring & Logging Tools:

Jira ServiceNow TWS for incident tracking and automation triggers
Prometheus Grafana or similar tools for system monitoring and alerting (desired)
Log management and analysis tools for troubleshooting

Incident & Problem Management:

Expertise with root cause analysis incident handling and escalation workflows
Understanding of ITIL principles (preferred)

Cloud & Infrastructure (Desired):

Basic familiarity with AWS or Azure support environments (preferred)

Supporting Tools:

Automation platforms such as Ansible Docker Kubernetes (favorable)

Experience Requirements

3 years of experience supporting production applications in enterprise or financial systems
Proven ability to troubleshoot and resolve high-severity incidents efficiently
Experience with incident management root cause analysis and post-incident reporting
Hands-on automation experience for operational support tasks
Exposure to cloud-based platforms or hybrid environments is a plus

Day-to-Day Activities

Respond to and resolve critical incidents impacting banking and financial services applications
Automate support workflows incident resolution and operational tasks using Python and scripting tools
Monitor system health review alert thresholds and improve observability dashboards
Conduct root cause analysis and implement permanent solutions to prevent recurring issues
Collaborate with cross-functional teams to align support activities with business priorities
Support deployment system upgrades and capacity planning activities
Document incident procedures operational workflows and best practices
Participate in active incident management escalation and post-mortem reviews
Support continuous improvement initiatives to optimize operational workflows and automate manual tasks

Qualifications

Bachelors degree in Computer Science Information Technology or related field
3 years experience in production support incident management and automation in enterprise environments
Practical experience with support tools like Jira ServiceNow and monitoring platforms
Ability to work effectively in a fast-paced highly regulated environment

Professional Competencies

Critical thinking and excellent troubleshooting skills for incident resolutions
Strong communication skills to collaborate with technical and business stakeholders effectively
Leadership qualities to manage support activities and mentor junior team members
Adaptability to evolving systems tools and industry standards
Proactive approach to incident prevention and automation-driven support solutions
Ability to prioritize tasks and work efficiently under pressure

SYNECHRONS DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity Equity and Inclusion (DEI) initiative Same Difference is committed to fostering an inclusive culture promoting equality diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger successful businesses as a global company. We encourage applicants from across diverse backgrounds race ethnicities religion age marital status gender sexual orientations or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements mentoring internal mobility learning and development programs and more.

All employment decisions at Synechron are based on business needs job requirements and individual qualifications without regard to the applicants gender gender identity sexual orientation race ethnicity disabled or veteran status or any other characteristic protected by law.

Candidate Application Notice

Required Experience:

Job SummarySynechron is seeking a skilled Production Support Engineer with expertise in automation incident management and Site Reliability Engineering (SRE) practices. The role involves supporting critical banking and financial applications automating operational processes and ensuring high system ...

Job Summary

Software Requirements

Required:

Proficiency in Python scripting (3 years) for automating operational tasks and incident response
Experience with monitoring and observability tools such as Jira ServiceNow and TWS (7 years)
Strong troubleshooting and problem-solving skills for complex distributed systems
Experience handling production outages managing incident escalation and conducting post-incident reviews
Knowledge of stakeholder communication and documentation practices in support environments

Preferred:

Experience with automation frameworks or tools such as Ansible or Shell scripting
Familiarity with cloud platforms like AWS or Azure (desired)
Exposure to DevOps practices and tools for continuous improvement

Overall Responsibilities

Lead and manage high-priority incidents (P1/P2) to ensure rapid resolution and minimal impact on business operations
Conduct root cause analysis and implement permanent fixes to reduce recurring issues
Develop and maintain automation scripts in Python to streamline incident management and support processes
Collaborate with support infrastructure and application teams to improve system reliability and monitoring strategies
Support platform health by configuring and maintaining alerting logging and observability tools
Lead efforts to reduce Mean Time to Resolution (MTTR) and Mean Time to Detection (MTTD) through automation and process improvements
Document procedures solutions and best practices for incident response and operational workflows
Participate in on-call rotations supporting 24/7 system availability and incident escalation
Assist with platform monitoring capacity planning disaster recovery and performance tuning
Drive continuous improvement initiatives to enhance operational efficiency toolsets and incident management workflows

Technical Skills (By Category)

Programming & Automation (Essential):

Python (script development automation support tasks)

Preferred:

Shell scripting or PowerShell for operational scripting

Monitoring & Logging Tools:

Jira ServiceNow TWS for incident tracking and automation triggers
Prometheus Grafana or similar tools for system monitoring and alerting (desired)
Log management and analysis tools for troubleshooting

Incident & Problem Management:

Expertise with root cause analysis incident handling and escalation workflows
Understanding of ITIL principles (preferred)

Cloud & Infrastructure (Desired):

Basic familiarity with AWS or Azure support environments (preferred)

Supporting Tools:

Automation platforms such as Ansible Docker Kubernetes (favorable)

Experience Requirements

3 years of experience supporting production applications in enterprise or financial systems
Proven ability to troubleshoot and resolve high-severity incidents efficiently
Experience with incident management root cause analysis and post-incident reporting
Hands-on automation experience for operational support tasks
Exposure to cloud-based platforms or hybrid environments is a plus

Day-to-Day Activities

Respond to and resolve critical incidents impacting banking and financial services applications
Automate support workflows incident resolution and operational tasks using Python and scripting tools
Monitor system health review alert thresholds and improve observability dashboards
Conduct root cause analysis and implement permanent solutions to prevent recurring issues
Collaborate with cross-functional teams to align support activities with business priorities
Support deployment system upgrades and capacity planning activities
Document incident procedures operational workflows and best practices
Participate in active incident management escalation and post-mortem reviews
Support continuous improvement initiatives to optimize operational workflows and automate manual tasks

Qualifications

Bachelors degree in Computer Science Information Technology or related field
3 years experience in production support incident management and automation in enterprise environments
Practical experience with support tools like Jira ServiceNow and monitoring platforms
Ability to work effectively in a fast-paced highly regulated environment

Professional Competencies

Critical thinking and excellent troubleshooting skills for incident resolutions
Strong communication skills to collaborate with technical and business stakeholders effectively
Leadership qualities to manage support activities and mentor junior team members
Adaptability to evolving systems tools and industry standards
Proactive approach to incident prevention and automation-driven support solutions
Ability to prioritize tasks and work efficiently under pressure

SYNECHRONS DIVERSITY & INCLUSION STATEMENT

Candidate Application Notice

Required Experience:

Key Skills

Apply Now

About Company

Synechron

Chez Synechron, nous croyons en la puissance du numérique pour transformer les entreprises en mieux. Notre cabinet de conseil mondial combine la créativité et la technologie innovante pour offrir des solutions numériques de premier plan. Les technologies progressistes et les stratégie ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click