Site Reliability Engineer
Full Time position
Direct Client Requirement
100% Onsite at NYC, NY
We are looking for a Site Reliability Engineer to join our Database Engineering organization. At Client, the Database Engineering organization owns the top-level reliability, observability, and availability of the Datastore platforms, including but not limited to Cassandra, ElasticSearch and Kafka. This team contributes to projects, services, designs, and processes with the aim to steward good architecture and provide tools and services to enable software engineering teams to measure and meet reliability agreements.
What You Bring To The Table
- Experience developing backend applications in Python or Java
- Experience managing, working or developing large Elasticsearch clusters in highly available 24x7 production environments
- Experience automating the maintenance of infrastructure using Python and Ansible or similar tools.
- Experience managing automated cloud infrastructures on AWS or other major cloud providers.
- Experience managing large Cassandra clusters in production is a strong plus.
- Experience working with docker is a plus
- Ability to quickly learn new concepts and technologies and adapt to changing needs
Our Tech
- Most of our internal tooling is written in Python.
- Most of our microservices are written in Java
- Observability tools we use: Datadog, Splunk, Lightstep.
- Our primary persistence store is Cassandra
- We operate in 3 Amazon regions
- We primarily rely on AWS and its services: EC2, S3, SNS/SQS, ElastiCache, Lambda, etc.