Role: SRE Consultant
Location: Kansas City or Seattle WA
Duration: 6 months
Description
Skills
- Production support expertise with SRE Observability experience:
- Proactive issue identification using observability tools.
- Skills in using different monitoring & observability tools to track system performance
- Production support activities including proactive identification of issues leveraging observability tools Corelating inputs from various dashboards & tools to drive resolution
- Experience in swiftly identifying probable failure points through the analysis of multiple inputs from the logs observability dashboards recent application changes infra network changes etc.
- Basic level of trouble shooting on every layer of the tech stack (Application Database Infra (Container platforms) and Network )
- Experience in setting up observability dashboards based on Splunk logs
Technical expertise:
- Analysis of issues via Splunk (including Splunk APM and Splunk O11y) AppDynamics Grafana RedMetrics 1000Eyes
- Debugging of issues in VMs Load balancers Firewalls API Gateways DB Network Linux / Unix
- Debugging of issues in Containerization Docker Kubernetes AWS PCF Azure
- Analysis of issues via APM NMON Wireshark usage and analysis
- Database performance monitoring and analysis
- Experience in UEM and synthetic monitoring set up
- Experience in heap dump analysis memory leak analysis and resource optimization
Communication:
- Excellent communicator. They are also expected to actively lead and triage proactively identified issues/incidents where VPs/SVPs are also present in these call.
- Leadership in triage calls - direct the teams for actions to be taken on the call
Automation:
- Experience in Toil identification and automation
- Flexibility to work in 24 X 7 environment
Optional skills:
- ServiceNow (including AIOps tools for Self-Heal and automated playbooks)
- Development experience in some of the technologies -Java Python AWS Azure Oracle Cassandra SQL Server My SQL and Mongo DB