Job Duties & Responsibilities
- Monitor network system and application performance across on-premises and Azure IaaS environments using tools such as Dynatrace LogicMonitor Databricks and other integrated platforms
- Analyze and interpret monitoring data by building dashboards reviewing trends querying logs and researching historical data to identify potential issues and proactively prevent outages.
- Investigate alerts and anomalies conducting thorough root analysis to understand triggering behaviors and determine appropriate next steps.
- Manage incidents and requests using ServiceNow or similar ticketing systems to assign track and resolve issues efficiently
- Identify recurring problems and collaborate with cross-functional teams to implement automation self healing solutions and other improvements that enhance performance and reliability
- Create and maintain documentation including Standard Operating Procedures
- Operations Playbooks and Knowledge Base articles using consistent formatting and style
- Troubleshoot complex technical issues involving advanced networking DNS DHCP SMTP web/application servers and Azure infrastructure and PaaS services
- Develop advanced PowerShell scripts to support automation and operational improvements
- Participate in root cause analysis for service interruptions and assist in recovery efforts as needed.
Skills Knowledge & Experience
- 2 years of hands-on systems engineering experience including work with Windows Linux and Azure Cloud environments
- Strong analytical and critical thinking skills with the ability to diagnose and solve complex technical issues
- Experience managing incidents in fast-paced or high-pressure environments
- Proficiency with Dynatrace or comparable monitoring tools (required)
- Experience with ServiceNow or similar ITSM platforms
- PowerShell scripting experience or equivalent automation skills preferred
- Deep troubleshooting expertise across network system infrastructure and application layers with an ability to look at the whole picture and create an understanding spanning from probable cause to end user impact
- Excellent communication skills including the ability to translate technical information for nontechnical audiences and understand business impacts during incidents
- Strong knowledge of application and infrastructure design principles methodologies and problem-solving approaches
- Familiarity with ITIL 4 foundational practices
Job Duties & Responsibilities Monitor network system and application performance across on-premises and Azure IaaS environments using tools such as Dynatrace LogicMonitor Databricks and other integrated platforms Analyze and interpret monitoring data by building dashboards reviewing trends querying...
Job Duties & Responsibilities
- Monitor network system and application performance across on-premises and Azure IaaS environments using tools such as Dynatrace LogicMonitor Databricks and other integrated platforms
- Analyze and interpret monitoring data by building dashboards reviewing trends querying logs and researching historical data to identify potential issues and proactively prevent outages.
- Investigate alerts and anomalies conducting thorough root analysis to understand triggering behaviors and determine appropriate next steps.
- Manage incidents and requests using ServiceNow or similar ticketing systems to assign track and resolve issues efficiently
- Identify recurring problems and collaborate with cross-functional teams to implement automation self healing solutions and other improvements that enhance performance and reliability
- Create and maintain documentation including Standard Operating Procedures
- Operations Playbooks and Knowledge Base articles using consistent formatting and style
- Troubleshoot complex technical issues involving advanced networking DNS DHCP SMTP web/application servers and Azure infrastructure and PaaS services
- Develop advanced PowerShell scripts to support automation and operational improvements
- Participate in root cause analysis for service interruptions and assist in recovery efforts as needed.
Skills Knowledge & Experience
- 2 years of hands-on systems engineering experience including work with Windows Linux and Azure Cloud environments
- Strong analytical and critical thinking skills with the ability to diagnose and solve complex technical issues
- Experience managing incidents in fast-paced or high-pressure environments
- Proficiency with Dynatrace or comparable monitoring tools (required)
- Experience with ServiceNow or similar ITSM platforms
- PowerShell scripting experience or equivalent automation skills preferred
- Deep troubleshooting expertise across network system infrastructure and application layers with an ability to look at the whole picture and create an understanding spanning from probable cause to end user impact
- Excellent communication skills including the ability to translate technical information for nontechnical audiences and understand business impacts during incidents
- Strong knowledge of application and infrastructure design principles methodologies and problem-solving approaches
- Familiarity with ITIL 4 foundational practices
View more
View less