Stability AI is the enterprise-ready creative partner for teams and creators delivering professional-grade generative AI tools and solutions for media generation and editing across image video 3D and audio to enable creative production at scale. Stability AI sparked the generative AI revolution with the release of Stable Diffusion in August 2022 putting generative technology in the hands of millions of creators globally and cementing its position as a leader in the field. Stable Diffusion models have since been downloaded more than 350 million times.
Recognized by Fortune as one of the 50 AI Innovators and by TIME as one of the Most Influential Companies with Stable Audio named to TIMEs Best Inventions June 2024 Stability AI entered its next phase of growth with the appointment of a renowned leadership team: Sean Parker as Executive Chairman Prem Akkaraju as CEO and James Cameron as Board Member.
< Remote - United States >
Job Description:
Stability AIs Engineering Operations team is looking for a Senior Site Reliability Engineer (SRE) to join our growing team and play a pivotal role in improving and shaping our cloud infrastructure. The person will closely work with engineering IT security and product teams to drive innovation and reliability in an evolving environment. Candidates should have the initiative to build and improve a maturing cloud landscape.
Responsibilities:
- Developing and enforcing SRE best practices and standards across the organization.
- Architecting and managing scalable systems in AWS and other cloud environments focusing on high availability and resilience.
- Implementing and maintaining infrastructure as code using Terraform.
- Setting up and refining monitoring logging and alerting systems.
- Driving incident management and root cause analysis to improve system reliability.
- Championing SRE principles and mentoring junior team members.
Qualifications:
- Collaborating with development teams to enhance CI/CD pipelines.
- Experience scaling resource intensive systems be it storage networking or compute.
- Knowledge and experience with Kubernetes or other container scaling solutions
- Background in software development or automation scripting.
- Knowledge and experience with Grafana ELK stack or similar tools.
- Cloud security experience.
Equal Employment Opportunity:
We are an equal opportunity employer and do not discriminate on the basis of race religion national origin gender sexual orientation age veteran status disability or other legally protected statuses.
Required Experience:
Senior IC
Stability AI is the enterprise-ready creative partner for teams and creators delivering professional-grade generative AI tools and solutions for media generation and editing across image video 3D and audio to enable creative production at scale. Stability AI sparked the generative AI revolution with...
Stability AI is the enterprise-ready creative partner for teams and creators delivering professional-grade generative AI tools and solutions for media generation and editing across image video 3D and audio to enable creative production at scale. Stability AI sparked the generative AI revolution with the release of Stable Diffusion in August 2022 putting generative technology in the hands of millions of creators globally and cementing its position as a leader in the field. Stable Diffusion models have since been downloaded more than 350 million times.
Recognized by Fortune as one of the 50 AI Innovators and by TIME as one of the Most Influential Companies with Stable Audio named to TIMEs Best Inventions June 2024 Stability AI entered its next phase of growth with the appointment of a renowned leadership team: Sean Parker as Executive Chairman Prem Akkaraju as CEO and James Cameron as Board Member.
< Remote - United States >
Job Description:
Stability AIs Engineering Operations team is looking for a Senior Site Reliability Engineer (SRE) to join our growing team and play a pivotal role in improving and shaping our cloud infrastructure. The person will closely work with engineering IT security and product teams to drive innovation and reliability in an evolving environment. Candidates should have the initiative to build and improve a maturing cloud landscape.
Responsibilities:
- Developing and enforcing SRE best practices and standards across the organization.
- Architecting and managing scalable systems in AWS and other cloud environments focusing on high availability and resilience.
- Implementing and maintaining infrastructure as code using Terraform.
- Setting up and refining monitoring logging and alerting systems.
- Driving incident management and root cause analysis to improve system reliability.
- Championing SRE principles and mentoring junior team members.
Qualifications:
- Collaborating with development teams to enhance CI/CD pipelines.
- Experience scaling resource intensive systems be it storage networking or compute.
- Knowledge and experience with Kubernetes or other container scaling solutions
- Background in software development or automation scripting.
- Knowledge and experience with Grafana ELK stack or similar tools.
- Cloud security experience.
Equal Employment Opportunity:
We are an equal opportunity employer and do not discriminate on the basis of race religion national origin gender sexual orientation age veteran status disability or other legally protected statuses.
Required Experience:
Senior IC
View more
View less