SRE- Prometheus

Shrive Technologies LLC

Job Location:

Columbus, OH - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Required Qualifications:

8 years of Software Engineering experience

4 years of experience in Site Reliability Engineering teams with continued focus on improving Platform health

Familiar with Agile or other rapid application development practices

Hands-on expertise in building dashboards using APM tools.

Experience with distributed (multi-tiered) systems algorithms relational databases and NoSQL databases.

Knowledge & Exposure caching tools (Redis memcache) or messaging tools such as MQ Kafka.

Must have working knowledge of APM tools such as splunk GCL ELK Grafana Prometheus etc.

Able to create Dashboards using GCL/Splunk/ELK and setup alerts.

Working knowledge of CICD is a plus Source control like Git Continuous Integration Jenkins / UCD Release etc. .

Ability to work with Engineering teams across the ecosystem such as Security Networking & Infrastructure challenges which can impact platform health & resiliency.

Shell Scripting / DevOps tools like Ansible with good knowledge of yaml file to write playbooks .

Experience with distributed storage technologies like NFS as well as dynamic resource management frameworks PCF Kubernetes / OpenShift AWS or Azure.

Tech Stack: Java/J2EE (Spring Spring Boot Python Shell Scripting Kafka Oracle MongoDB etc.).

A proactive approach to spotting problems areas for improvement and performance bottlenecks.

Required Qualifications: 8 years of Software Engineering experience 4 years of experience in Site Reliability Engineering teams with continued focus on improving Platform health Familiar with Agile or other rapid application development practices Hands-on expert...