SRE- Prometheus
Columbus, NE - USA
Job Summary
| Required Qualifications: |
| 8 years of Software Engineering experience |
| 4 years of experience in Site Reliability Engineering teams with continued focus on improving Platform health |
| Familiar with Agile or other rapid application development practices |
| Hands-on expertise in building dashboards using APM tools. |
| Experience with distributed (multi-tiered) systems algorithms relational databases and NoSQL databases. |
| Knowledge & Exposure caching tools (Redis memcache) or messaging tools such as MQ Kafka. |
| Must have working knowledge of APM tools such as splunk GCL ELK Grafana Prometheus etc. |
| Able to create Dashboards using GCL/Splunk/ELK and setup alerts. |
| Working knowledge of CICD is a plus Source control like Git Continuous Integration Jenkins / UCD Release etc. . |
| Ability to work with Engineering teams across the ecosystem such as Security Networking & Infrastructure challenges which can impact platform health & resiliency. |
| Shell Scripting / DevOps tools like Ansible with good knowledge of yaml file to write playbooks . |
| Experience with distributed storage technologies like NFS as well as dynamic resource management frameworks PCF Kubernetes / OpenShift AWS or Azure. |
| Tech Stack: Java/J2EE (Spring Spring Boot Python Shell Scripting Kafka Oracle MongoDB etc.). |
| A proactive approach to spotting problems areas for improvement and performance bottlenecks. |