Role: Splunk Subject Matter Expert (SME) and Enterprise Monitoring Engineer
Location: 3 - Atlanta GA 1 Frisco TX (Hybrid 3Days Onsite in a week) LOCAL ONLY
Mandatory Skills: Splunk Enterprise Splunk Dashboard Design Monitoring Systems
Job Summary:
We are looking for a highly skilled Splunk Subject Matter Expert (SME) and Enterprise Monitoring Engineer to lead the design implementation and optimization of our monitoring and observability ecosystem. The ideal candidate will be an expert in Splunk with a strong background in enterprise IT infrastructure system performance monitoring and log analytics. You will play a pivotal role in ensuring end-to-end visibility across our systems applications and services.
Key Responsibilities:
Splunk Administration & Engineering
- Serve as the SME for Splunk architecture deployment and configuration across the enterprise.
- Maintain and optimize Splunk infrastructure including indexers forwarders search heads and clusters.
- Develop and manage custom dashboards alerts saved searches and visualizations.
- Implement and tune log ingestion pipelines using Splunk Universal Forwarders HTTP Event Collector and other data inputs.
- Ensure high availability scalability and performance of the Splunk environment.
- Creating dashboards Reports Alerts Advance Splunk Search Visualization log parsing and external table lookups
- Expertise with SPL (Search Processing Language ) and understanding of Splunk architecture including configuration files.
- Wide Experience in monitoring and troubleshooting applications using tools like AppDynamics Splunk Grafana Argos OTEL etc. to build observability for large-scale microservice deployments.
- Creating dashboards for various applications to monitor health network issues and configure alerts.
- Excellent problem-solving triaging and debugging skills in large-scale distributed systems
- Establishing and documenting run books and guidelines for using the multi-cloud infrastructure and microservices platform.
- Experience in optimized search queries using summary indexing.
- Solid knowledge and experience in monitoring the Splunk infrastructure.
- Develop a long-term strategy and roadmap for AI/ML tooling to support the AI capabilities across the Splunk portfolio.
- Diagnose and resolve network-related issues affecting CI/CD pipelines debug DNS firewall proxy and SSL/TLS problems and use tools like tcpdump curl and netstat for proactive maintenance.
Enterprise Monitoring & Observability
- Design and implement holistic enterprise monitoring solutions integrating Splunk with tools like AppDynamics Dynatrace Prometheus Grafana SolarWinds or others.
- Collaborate with application infrastructure and security teams to define monitoring KPIs SLAs and alert thresholds.
- Build end-to-end visibility into application performance system health and user experience.
- Integrate Splunk with ITSM platforms (e.g. ServiceNow) for event and incident management automation.
Operations Troubleshooting & Optimization
- Perform data onboarding parsing and field extraction for structured and unstructured data sources.
- Support incident response and root cause analysis using Splunk for troubleshooting and forensics.
- Regularly audit and optimize search performance data retention policies and index lifecycle management.
- Create runbooks documentation and SOPs for Splunk and monitoring tool usage.
Required Qualifications:
- 5 years of experience in IT infrastructure DevOps or monitoring roles.
- 3 years of hands-on experience with Splunk Enterprise as an admin architect or engineer.
- Experience designing and managing large-scale multi-site Splunk deployments.
- Strong skills in SPL (Search Processing Language) dashboard design and alerting strategies.
- Familiarity with Linux systems scripting (e.g. Bash Python) and APIs.
- Experience with enterprise monitoring tools and integration with Splunk (e.g. AppDynamics Dynatrace Nagios Zabbix etc.).
- Understanding of logging metrics and tracing in modern environments (on-prem and cloud).
- Strong understanding of network protocols system logs and application telemetry.
Preferred Qualifications:
- Splunk certifications (e.g. Splunk Certified Power User Admin Architect).
- Experience with Splunk ITSI Enterprise Security or Observability Suite.
- Knowledge of cloud-native environments (AWS Azure or GCP) and cloud monitoring integrations.
- Experience with log aggregation security event monitoring or compliance (e.g. PCI HIPAA SOX).
- Familiarity with CI/CD pipelines and GitOps practices.