Location: ONSITE 5 days a week in Des Moines Iowa - 1776 West Lakes Pkwy West Des Moines IA 50266
Duration: Contract 9-10 months
Overview:
Seeking an experienced Observability and Monitoring Engineer to build and mature our enterprise-wide monitoring logging alerting and observability capabilities across our AWS-based technology stack.
This role will define the strategy architecture implementation standards and dashboards that enable proactive detection faster troubleshooting and data-driven insights across applications infrastructure operating systems databases file transfers and batch processes.
The ideal candidate has hands-on engineering expertise strong architecture skills and the ability to unify multiple monitoring solutions into a cohesive observability framework.
Responsibilities:
You will establish standards for logs metrics traces event correlation and alert across multiple environments
You will build centralized dashboards and alerting policies that provide unified visibility across: applications & services operating systems AWS services (EC2 RDS Lambda S3 CloudWatch CloudTrail etc.) databases (MS SQL Server PostgreSQL etc.) file transfer systems (SFTP managed transfer tools) batch jobs and scheduled processes.
You will create actionable and noise-free alerting thresholds escalation policies and runbooks.
You will integrate existing tools (Dynatrace Graylog Splunk SolarWinds Zabbix) into a cohesive ecosystem.
You will rationalize tool usage and recommend consolidation or modernization where appropriate.
You will manage the lifecycle configuration tuning and health of monitoring and logging platforms automate monitoring deployments using IaC (CloudFormation) and CI/CD pipelines and develop reusable templates/standards so teams can onboard new applications quickly.
You will build self-service dashboards and reporting for technical/business stakeholders create documentation for monitoring standards dashboard naming conventions logging schemas and alert configuration guidelines.
You will define SLOs/SLIs and reliability KPIs for critical services.
You will partner with scrum teams infrastructure and security teams to reduce MTTR and improve system reliability participate in incident resolution root cause analysis and problem management.
You will provide technical leadership/mentoring to team members and consult on architecture decisions and best practices.
You will Develop/maintain system documentation and participate in project planning and technical strategy sessions.
Qualifications:
Bachelors degree in Computer Science or related field
5 years of experience implementing monitoring and observability using Dynatrace
Hands-on experience with monitoring/logging tools such as Zabbix Graylog Splunk SolarWinds or equivalents
5 years of hands-on experience with AWS services and architecture
Deep understanding of metrics logs traces distributed tracing and event correlation
Experience building dashboards and KPIs for application infrastructure and database layers
Strong scripting/automation skills (Python Bash PowerShell) and familiarity with Terraform or CloudFormation
Strong understanding of network monitoring performance tuning and systems architecture
Familiarity with ITIL incident/problem management processes
Proficiency with AI tools and using them responsibly in improving observability preferred
Experience with container orchestration and microservices architecture preferred
Experience with AWS OpenTelemetry Prometheus Grafana or similar tools preferred
Required Technical Skills:
AWS Services (EC2 RDS S3 Lambda ECS/EKS etc.)
Configuration Management (Ansible Puppet Chef)
Monitoring Tools (Dynatrace CloudWatch Zabbix Solarwinds Graylog etc.)
CI/CD Tools (Jenkins Quickbuild Bitbucket)
Scripting Languages (Python PowerShell Bash)
Database Management (MS SQL Server PostgreSQL)
Infrastructure as Code (Terraform CloudFormation)
Container Technologies (Docker Kubernetes)
Required Skills :
Basic Qualification :
Additional Skills :
This is a high PRIORITY requisition. This is a PROACTIVE requisition
Background Check : No
Drug Screen : No
Location: ONSITE 5 days a week in Des Moines Iowa - 1776 West Lakes Pkwy West Des Moines IA 50266Duration: Contract 9-10 monthsOverview: Seeking an experienced Observability and Monitoring Engineer to build and mature our enterprise-wide monitoring logging alerting and observability capabiliti...
Location: ONSITE 5 days a week in Des Moines Iowa - 1776 West Lakes Pkwy West Des Moines IA 50266
Duration: Contract 9-10 months
Overview:
Seeking an experienced Observability and Monitoring Engineer to build and mature our enterprise-wide monitoring logging alerting and observability capabilities across our AWS-based technology stack.
This role will define the strategy architecture implementation standards and dashboards that enable proactive detection faster troubleshooting and data-driven insights across applications infrastructure operating systems databases file transfers and batch processes.
The ideal candidate has hands-on engineering expertise strong architecture skills and the ability to unify multiple monitoring solutions into a cohesive observability framework.
Responsibilities:
You will establish standards for logs metrics traces event correlation and alert across multiple environments
You will build centralized dashboards and alerting policies that provide unified visibility across: applications & services operating systems AWS services (EC2 RDS Lambda S3 CloudWatch CloudTrail etc.) databases (MS SQL Server PostgreSQL etc.) file transfer systems (SFTP managed transfer tools) batch jobs and scheduled processes.
You will create actionable and noise-free alerting thresholds escalation policies and runbooks.
You will integrate existing tools (Dynatrace Graylog Splunk SolarWinds Zabbix) into a cohesive ecosystem.
You will rationalize tool usage and recommend consolidation or modernization where appropriate.
You will manage the lifecycle configuration tuning and health of monitoring and logging platforms automate monitoring deployments using IaC (CloudFormation) and CI/CD pipelines and develop reusable templates/standards so teams can onboard new applications quickly.
You will build self-service dashboards and reporting for technical/business stakeholders create documentation for monitoring standards dashboard naming conventions logging schemas and alert configuration guidelines.
You will define SLOs/SLIs and reliability KPIs for critical services.
You will partner with scrum teams infrastructure and security teams to reduce MTTR and improve system reliability participate in incident resolution root cause analysis and problem management.
You will provide technical leadership/mentoring to team members and consult on architecture decisions and best practices.
You will Develop/maintain system documentation and participate in project planning and technical strategy sessions.
Qualifications:
Bachelors degree in Computer Science or related field
5 years of experience implementing monitoring and observability using Dynatrace
Hands-on experience with monitoring/logging tools such as Zabbix Graylog Splunk SolarWinds or equivalents
5 years of hands-on experience with AWS services and architecture
Deep understanding of metrics logs traces distributed tracing and event correlation
Experience building dashboards and KPIs for application infrastructure and database layers
Strong scripting/automation skills (Python Bash PowerShell) and familiarity with Terraform or CloudFormation
Strong understanding of network monitoring performance tuning and systems architecture
Familiarity with ITIL incident/problem management processes
Proficiency with AI tools and using them responsibly in improving observability preferred
Experience with container orchestration and microservices architecture preferred
Experience with AWS OpenTelemetry Prometheus Grafana or similar tools preferred
Required Technical Skills:
AWS Services (EC2 RDS S3 Lambda ECS/EKS etc.)
Configuration Management (Ansible Puppet Chef)
Monitoring Tools (Dynatrace CloudWatch Zabbix Solarwinds Graylog etc.)
CI/CD Tools (Jenkins Quickbuild Bitbucket)
Scripting Languages (Python PowerShell Bash)
Database Management (MS SQL Server PostgreSQL)
Infrastructure as Code (Terraform CloudFormation)
Container Technologies (Docker Kubernetes)
Required Skills :
Basic Qualification :
Additional Skills :
This is a high PRIORITY requisition. This is a PROACTIVE requisition
Background Check : No
Drug Screen : No
View more
View less