The Site Reliability Engineering organization at Pinterest is accountable for ensuring overall Pinterest availability as well as enhancing Engineering teams capability to design build and operate robust systems at scale. Pinterests applications and infrastructure that handle billions of monthly page views and petabytes of data as Pinterest continues to grow and scale. As a Pinterest SRE you will design and build systems platforms tools frameworks and methodologies to assure the reliability of our largescale distributed systems.
What Youll Do:
- Develop software solutions to enable reliability and operability of large scale distributed systems handling petabytes of data and serving
- Build a deep understanding of how Pinterests systems behave scale interact and fail and use that insight to identity risks and opportunities for remediation
- Build tools and automation to eliminate toil and reduce operational overhead. Create frameworks processes and best practices to be used across Pinterest Engineering
- Build meaningful insightful and actionable SLIs
- Automate critical portions of Pinterests engineering processes to minimize risk and maximize the speed of innovation
- Manage capacity and performance to help scale our infrastructure both on public and private clouds around the world
What Were Looking For:
- 4 years of industry experience building and operating large scale high performance distributed systems
- Experience programming with Python or Go
- Strong knowledge of Linux/Unix/BSD internals and experience working with open source software (e.g. MySQL Hadoop Envoy HAProxy Nginx)
- Experience with technologies such as ElasticSearch ZooKeeper HBase Hadoop Memcache and Kafka with a focus on reliability automation operability and performance
- Infrastructure as code a plus (e.g. Terraform Puppet Chef Ansible Salt Fabric Docker etc)
- Bonus points if experienced with deploying web apps to cloud infrastructure (AWS etc. and working with distributed serviceoriented architecture
- Bachelors degree in a relevant field such as Computer Science or equivalent experience
InOffice Requirement Statement:
- We let the type of work you do guide the collaboration style. That means were not always working in an office but we continue to gather for key moments of collaboration and connection.
- This role will need to be in the office for inperson collaboration 12 times every 6months and therefore can be situated anywhere in the country.
Relocation Statement:
- This position is not eligible for relocation assistance. Visit our PinFlex page to learn more about our working model.
#LIREMOTE
#LIJE2
Required Experience:
Senior IC