IT Infrastructure Production Support Engineer

Stefanini Group

Not Interested
Bookmark
Report This Job

profile Job Location:

Shanghai - China

profile Monthly Salary: Not Disclosed
Posted on: 23 hours ago
Vacancies: 1 Vacancy

Job Summary

Job Description

IT Infrastructure Production Support Engineer

Position Summary:

The IT Infrastructure Production Support Engineer provides advanced technical support and troubleshooting for Asia region enterprise infrastructure with deep expertise in virtualization data storage and networking technologies. This critical escalation role requires strong diagnostic skills to rapidly identify issues across multiple technology domains and coordinate with specialized teams to ensure swift resolution and minimal business impact.

Key Responsibilities:

Production Support & Incident Management:

* Serve as primary escalation point for critical production incidents affecting virtualization Windows/Linux OS storage infrastructure and enterprise networking

* Perform rapid root cause analysis across infrastructure layers to identify and isolate issues

* Coordinate incident response and engage specialized teams (network security compute application) based on technical assessment

* Monitor infrastructure health using tools (SolarWinds LiveNX Nagios) and proactively identify potential issues

* Maintain incident documentation and contribute to post-incident reviews

* Participate in 24/7 on-call rotation for production support coverage

Technical Troubleshooting & Problem Resolution:

* Troubleshoot complex issues spanning operating systems storage arrays backup solutions and cloud platforms

* Diagnose and resolve performance issues related to compute storage and network infrastructure

* Perform break-fix activities and system performance tuning

* Identify network-related issues and coordinate with network engineering teams for resolution

* Execute disaster recovery procedures and business continuity plans when required

Cross-Functional Collaboration:

* Partner with Enterprise Infrastructure Compute Security Network and Application teams

* Effectively communicate technical issues to both technical and non-technical stakeholders

* Identify patterns in incidents and work with engineering teams to implement permanent solutions

* Collaborate with project teams during infrastructure changes to ensure smooth transitions to production

Documentation & Knowledge Management:

* Create and maintain comprehensive system documentation including troubleshooting procedures and runbooks

* Document incident resolution steps and contribute to knowledge base

* Develop automation scripts to streamline support activities

Basic Qualifications/Professional Skills:

* B.S. degree in computer science information technology computer related discipline or 5-7 years IT work experience in a multi-site global infrastructure environment

* Progressive advancement demonstrated proven troubleshooting and problem-solving abilities

* Fluent in English; Mandarin proficiency preferred

* Strong communication collaboration and interpersonal skills

* Self-motivated with keen attention to detail and excellent judgment under pressure

* Ability to manage multiple concurrent incidents in high-pressure situations

* Team player with customer-focused mindset

Technical Skills/Experience:

Virtualization and OS Systems (Strong/Required):

* Proven experience with VMware in large-scale virtualized environments

* Experience with virtual machine troubleshooting and performance optimization

* Strong troubleshooting skills for Windows/Linux operating system issues

* Deep understanding with Red Hat and other Linux versions (CentOS RHEL Oracle Linux SUSE Linux)

* Experience with Red Hat Satellite and automation solutions such as Ansible or Puppet

* Proficiency in scripting languages including Shell Ruby and Perl for automation

Storage & Backup (Strong/Required):

* 5 years of experience with enterprise storage and backup solutions

* Experience with multiple storage platforms including Dell/EMC NetApp and Pure

* Knowledge of image-level backups array-based replication and hypervisor-based replication

* Experience with storage configuration volume management (LVM MPIO EMC PowerPath)

* Familiarity with SAN NAS operations and monitoring tools

* Understanding of data lifecycle management and tiering strategies

Network Knowledge (Working Knowledge/Required):

* Strong understanding of network topology concepts and technologies

* Ability to identify network-related issues and determine appropriate escalation path

* Knowledge of core LAN/WAN network technologies

* Familiarity with Cisco networking technologies and basic troubleshooting

* Understanding of network security concepts and protocols

* Ability to work with network teams to diagnose connectivity and performance issues

* Knowledge of load balancers and network accelerators

Additional Technical Skills:

* Strong understanding of network and server security

* Experience with converged hardware platforms including DELL HPE and Cisco

* Experience with system monitoring tools and techniques

Required Attributes:

* Problem Solver - Uses rigorous logic and systematic methods to diagnose and resolve complex technical issues quickly

* Communication - Can effectively communicate across all levels of the organization including technical and non-technical people both verbally and in writing

* Collaborative - Effective at working with cross-functional teams globally to resolve incidents

* Calm Under Pressure - Maintains composure and clear thinking during critical production incidents

* Customer-Focused - Committed to minimizing business impact and ensuring positive user experience

Preferred Certifications:

* ITIL Foundation

* Red Hat Certified Engineer (RHCE)

* VMware VCP

* Cisco CCNA

* AWS Certified Solutions Architect or Azure Administrator

Job Location: Shanghai



  • Production Support & Incident Management:

    * Serve as primary escalation point for critical production incidents affecting virtualization Windows/Linux OS storage infrastructure and enterprise networking

    * Perform rapid root cause analysis across infrastructure layers to identify and isolate issues

    * Coordinate incident response and engage specialized teams (network security compute application) based on technical assessment

    * Monitor infrastructure health using tools (SolarWinds LiveNX Nagios) and proactively identify potential issues

    * Maintain incident documentation and contribute to post-incident reviews

    * Participate in 24/7 on-call rotation for production support coverage

    Technical Troubleshooting & Problem Resolution:

    * Troubleshoot complex issues spanning operating systems storage arrays backup solutions and cloud platforms

    * Diagnose and resolve performance issues related to compute storage and network infrastructure

    * Perform break-fix activities and system performance tuning

    * Identify network-related issues and coordinate with network engineering teams for resolution

    * Execute disaster recovery procedures and business continuity plans when required

    Cross-Functional Collaboration:

    * Partner with Enterprise Infrastructure Compute Security Network and Application teams

    * Effectively communicate technical issues to both technical and non-technical stakeholders

    * Identify patterns in incidents and work with engineering teams to implement permanent solutions

    * Collaborate with project teams during infrastructure changes to ensure smooth transitions to production

    Documentation & Knowledge Management:

    * Create and maintain comprehensive system documentation including troubleshooting procedures and runbooks

    * Document incident resolution steps and contribute to knowledge base

    * Develop automation scripts to streamline support activities

    Basic Qualifications/Professional Skills:

    * B.S. degree in computer science information technology computer related discipline or 5-7 years IT work experience in a multi-site global infrastructure environment

    * Progressive advancement demonstrated proven troubleshooting and problem-solving abilities

    * Fluent in English; Mandarin proficiency preferred

    * Strong communication collaboration and interpersonal skills

    * Self-motivated with keen attention to detail and excellent judgment under pressure

    * Ability to manage multiple concurrent incidents in high-pressure situations

    * Team player with customer-focused mindset

    Technical Skills/Experience:

    Virtualization and OS Systems (Strong/Required):

    * Proven experience with VMware in large-scale virtualized environments

    * Experience with virtual machine troubleshooting and performance optimization

    * Strong troubleshooting skills for Windows/Linux operating system issues

    * Deep understanding with Red Hat and other Linux versions (CentOS RHEL Oracle Linux SUSE Linux)

    * Experience with Red Hat Satellite and automation solutions such as Ansible or Puppet

    * Proficiency in scripting languages including Shell Ruby and Perl for automation

    Storage & Backup (Strong/Required):

    * 5 years of experience with enterprise storage and backup solutions

    * Experience with multiple storage platforms including Dell/EMC NetApp and Pure

    * Knowledge of image-level backups array-based replication and hypervisor-based replication

    * Experience with storage configuration volume management (LVM MPIO EMC PowerPath)

    * Familiarity with SAN NAS operations and monitoring tools

    * Understanding of data lifecycle management and tiering strategies

    Network Knowledge (Working Knowledge/Required):

    * Strong understanding of network topology concepts and technologies

    * Ability to identify network-related issues and determine appropriate escalation path

    * Knowledge of core LAN/WAN network technologies

    * Familiarity with Cisco networking technologies and basic troubleshooting

    * Understanding of network security concepts and protocols

    * Ability to work with network teams to diagnose connectivity and performance issues

    * Knowledge of load balancers and network accelerators

    Additional Technical Skills:

    * Strong understanding of network and server security

    * Experience with converged hardware platforms including DELL HPE and Cisco

    * Experience with system monitoring tools and techniques

    Required Attributes:

    * Problem Solver - Uses rigorous logic and systematic methods to diagnose and resolve complex technical issues quickly

    * Communication - Can effectively communicate across all levels of the organization including technical and non-technical people both verbally and in writing

    * Collaborative - Effective at working with cross-functional teams globally to resolve incidents

    * Calm Under Pressure - Maintains composure and clear thinking during critical production incidents

    * Customer-Focused - Committed to minimizing business impact and ensuring positive user experience

    Preferred Certifications:

    * ITIL Foundation

    * Red Hat Certified Engineer (RHCE)

    * VMware VCP

    * Cisco CCNA

    * AWS Certified Solutions Architect or Azure Administrator


Required Experience:

IC

Job DescriptionIT Infrastructure Production Support EngineerPosition Summary:The IT Infrastructure Production Support Engineer provides advanced technical support and troubleshooting for Asia region enterprise infrastructure with deep expertise in virtualization data storage and networking technolog...
View more view more

About Company

Company Logo

Created in 1987, Stefanini is a $1B global IT provider of business solutions with locations in 40 countries across the Americas, Europe, Australia and Asia. With more than 25,000 employees, Stefanini provides onshore, offshore and nearshore IT services, including application developme ... View more

View Profile View Profile