As a Senior Site Reliability Engineer joining TCE EngOps you will design build and operate the systems and tooling that ensure the availability and reliability of critical services. Youll lead initiatives in observability incident management infrastructure automation and performance this role youll collaborate closely with development teams promote SRE best practices and mentor peers to strengthen reliability culture across the organization.
You will participate in an L3 on-call rotation driving rapid recovery during incidents and championing systemic improvements afterward. Youll also explore and apply emerging technologies including AI-driven practices and tooling to continuously improve reliability automation and developer experience. This position emphasizes both hands-on engineering and coaching/enablement helping uplift the reliability capabilities of the broader engineering organization.
Responsibilities
Own the reliability scalability and performance of production services.
Define and implement SLOs/SLAs error budgets and capacity planning.
Design and evolve monitoring alerting and observability dashboards with tools such as Prometheus Grafana and Datadog.
Participate in incident response blameless postmortems chaos testing and systemic remediation.
Drive safe release practices including canary and blue-green deployments rollback automation and CI/CD improvements.
Enable performance and load testing tooling to enable developers to validate scalability and efficiency.
Apply cost optimization strategies to improve cloud spend efficiency.
Build and manage Infrastructure as Code with Terraform.
Operate and scale containerized services with Docker and Kubernetes.
Automate workflows and tooling using Python Go and Bash.
Implement cloud best practices in AWS (EC2 VPC IAM S3 Route 53).
Promote shift-left reliability practices through pre-launch reviews CI quality gates and risk identification.
Mentor coach and embed with engineering teams to share SRE practices and build reliability maturity.
In addition to a competitive base salary and benefits this position is also eligible for equity awards based on factors such as experience performance and location.
8 years of SRE DevOps or Platform Engineering experience.
Proven expertise in designing SLOs monitoring strategies and incident response frameworks.
Strong proficiency with Terraform GitLab CI/CD and cloud infrastructure (AWS).
Hands-on experience with Kubernetes and Docker.
Skilled in Python Go or Bash for automation and tooling.
Experienced with Prometheus Grafana Datadog or Splunk for observability.
Deep understanding of networking security practices and cloud cost optimization.
Strong collaborator with experience in developer enablement coaching and knowledge sharing.
Excellent communicator who values blamelessness automation and continuous improvement.
Committed to continuous learning and exploration of emerging technologies including AI and automation to drive reliability excellence.
At Zillow were reimagining how people movethrough the real estate market and through their careers. As the most-visited real estate platform in the U.S. we help customers navigate buying selling financing and renting with greater ease and confidence. Whether youre working in tech sales operations or design youll be part of a company thats reshaping an industry and helping more people make home a reality.
Zillow is honored to be recognized among the best workplaces in the country. Zillow was named one of FORTUNE 100 Best Companies to Work For in 2025 and included on the PEOPLE Companies That Care 2025 list reflecting our commitment to creating an innovative inclusive and engaging culture where employees are empowered to grow.
No matter where you sit in the organization your work will help drive innovation support our customers and move the industryand your careerforward together.
Zillow Group is an equal opportunity employer committed to fostering an inclusive innovative environment with the best employees. We are committed to equal employment opportunity regardless of race color ancestry religion sex national origin sexual orientation age citizenship marital status disability gender identity or Veteran status. If you have a disability or special need that requires accommodation please contact your recruiter directly.
Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable state and local law.
Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees supervisors and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees supervisors and staff to ensure exceptional customer service; and follow all federal state and local laws and Company policies. Criminal history may have a direct adverse and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above as well as the abilities to adhere to company policies exercise sound judgment effectively manage stress and work safely and respectfully with others exhibit trustworthiness and professionalism and safeguard business operations and the Companys reputation. Pursuant to the Los Angeles County Fair Chance Ordinance we will consider for employment qualified applicants with arrest and conviction records.
Required Experience:
Senior IC
The leading real estate marketplace. Search millions of for-sale and rental listings, compare Zestimate® home values and connect with local professionals.