Senior Site Reliability Engineer

New York City, NY - USA

Monthly Salary: Not Disclosed

Posted on: 4 hours ago

Vacancies: 1 Vacancy

Job Summary

About the Company

Stellar is a decentralized public blockchain that gives developers the tools to create experiences that are more like cash than crypto. The network is faster cheaper and far more energy-efficient than most blockchain-based systems. Its designed so Stellars ecosystem can make a real-world lasting impact.

About the Role

SDF is looking for a Senior Site Reliability Engineer to help build and operate the foundation that powers our engineering teams. Youll ensure the reliability and scalability of our systems design and improve the infrastructure behind our production environments and automate operational work so developers can focus on building great products.

Key Responsibilities

Maintain improve scale and secure our AWS/GCP infrastructure and Linux systems.
Assist our development teams in running packaging deploying and troubleshooting applications
Work with developers on streamlining deployment processes with Jenkins and other CI/CD tooling.
Build maintain monitor and improve our Kubernetes clusters.
Work with development teams on migrating applications to Kubernetes.
Be responsible for maintenance and improvements to multiple internal services for example Kubernetes Prometheus ELK.
Monitor triage and respond to alerts in our high availability environments.
Participate in design and code reviews and ensure that the foundation for our services is best in class.
Evaluate new technologies design and implement as appropriate.
Identify automation opportunities and implement by creating custom or by using off the shelf solutions.

Requirements

5 years of experience of working in cloud-based systems operations as a SRE or DevOps engineer.

First-hand experience with configuration management and infrastructure as code (Ansible Puppet Terraform).

Proficient in utilizing SRE methodologies like capacity planning and disaster recovery testing to ensure the scalability resilience and availability of critical services.

Production experience building and maintaining Kubernetes clusters.

Will need to know how to code

Bonus Skills

Ability to understand Go Rust C and TypeScript source code
Experience experimenting with AI-driven approaches to operations
Comfortable with participating in on-call rotations and conducting thorough root cause analyses to keep systems running smoothly.
Experienced in managing production workloads and skilled in using monitoring tools to detect issues early.
A strong understanding of computer networking TCP/UDP load balancing distributed computing web services and the fundamental protocols used by the internet (HTTP HTTPS DNS etc.).
No blockchain needed
Experience using AI is a plus

About the Company Stellar is a decentralized public blockchain that gives developers the tools to create experiences that are more like cash than crypto. The network is faster cheaper and far more energy-efficient than most blockchain-based systems. Its designed so Stellars ecosystem can make a real...

About the Company

About the Role

Key Responsibilities

Maintain improve scale and secure our AWS/GCP infrastructure and Linux systems.
Assist our development teams in running packaging deploying and troubleshooting applications
Work with developers on streamlining deployment processes with Jenkins and other CI/CD tooling.
Build maintain monitor and improve our Kubernetes clusters.
Work with development teams on migrating applications to Kubernetes.
Be responsible for maintenance and improvements to multiple internal services for example Kubernetes Prometheus ELK.
Monitor triage and respond to alerts in our high availability environments.
Participate in design and code reviews and ensure that the foundation for our services is best in class.
Evaluate new technologies design and implement as appropriate.
Identify automation opportunities and implement by creating custom or by using off the shelf solutions.

Requirements

5 years of experience of working in cloud-based systems operations as a SRE or DevOps engineer.

First-hand experience with configuration management and infrastructure as code (Ansible Puppet Terraform).

Proficient in utilizing SRE methodologies like capacity planning and disaster recovery testing to ensure the scalability resilience and availability of critical services.

Production experience building and maintaining Kubernetes clusters.

Will need to know how to code

Bonus Skills

Ability to understand Go Rust C and TypeScript source code
Experience experimenting with AI-driven approaches to operations
Comfortable with participating in on-call rotations and conducting thorough root cause analyses to keep systems running smoothly.
Experienced in managing production workloads and skilled in using monitoring tools to detect issues early.
A strong understanding of computer networking TCP/UDP load balancing distributed computing web services and the fundamental protocols used by the internet (HTTP HTTPS DNS etc.).
No blockchain needed
Experience using AI is a plus

Key Skills

Apply Now

About Company

TechChain Talent

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Senior Site Reliability Engineer

New York City, NY - USA

Job Summary

About the Company

About the Role

Key Responsibilities

Requirements

Bonus Skills

About the Company

About the Role

Key Responsibilities

Requirements

Bonus Skills

Key Skills

About Company

Related Jobs