Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailHello
My name is Bhushan. I just received details on a great job that I believe you would be a great fit for. Please take a look below and share your interest. If not interested I would also appreciate if you can recommend me someone looking for a similar role.
Job Title: Site Reliability Engineer (SRE)
Location: Hybrid - Austin TX (Onsite 3x a week Locals only)
Duration: 6-12 Months Contract
Interview process: Video
VISA: USC / GC only
LinkedIn: Required (Must be created before 2020 or older)
Job Description-:
Note from Manager: Looking for a senior/expert of managing Kubernetes platform not deployment of application Kubernetes
Description:
Were searching for a driven Site Reliability Engineer (SRE) to join our innovative team. As an SRE youll be a cornerstone of our production software ensuring our systems are uncompromisingly reliable secure and scalable. Your expertise will be vital in maintaining constant uptime seamless scalability and a thriving environment for new applications and services. The ideal candidate is a highly motivated self-starter with a passion for excellence quality and meticulous attention to detail.
This role goes beyond traditional SRE work. Youll not only keep our systems running smoothly but also collaborate closely with developers and architects. Together youll design and implement solutions for improved stability security and scalability.
Responsibilities:
Production & Non-Production Environments: Operate monitor and prioritize tasks across all production and non-production environments demonstrating strong operational focus.
Innovative Problem Solver: Design build and implement innovative software solutions to address existing challenges and proactively anticipate future needs.
Documentation & Collaboration: Create clear alert handling procedures and runbooks ensuring knowledge transfer and collaboration within and between SRE teams.
Automation Champion: Automate service deployment and orchestration in the cloud environment as well as other routine processes to streamline operations and reduce toil.
Resilience & Growth: Actively participate in capability planning scale testing and disaster recovery exercises ensuring our systems remain resilient.
Team Player: Foster strong relationships and provide support to partner teams like engineering QA and program management.
Key Qualifications:
Cloud Platforms: Experience with major public cloud providers and their cloud-native services. Familiarity with infrastructure as code (IaC) tools like Terraform or Ansible.
Container Technologies: Proficiency in Kubernetes to deploy operate and troubleshoot container-based applications.
SRE Principles: Adherence to SRE principles including monitoring alerting error budgets fault analysis and automation. Strong focus on reliability availability and performance.
Telemetry and Observability: Expertise in implementing and coordinating telemetry using tools like Splunk Grafana and Prometheus. Ability to analyze and troubleshoot complex system issues.
Programming: Proficiency in Python GoLang for developing automation scripts tools and custom applications.
Collaboration: Excellent interpersonal and communication skills. Ability to work effectively in cross-functional teams and foster a collaborative environment.
Education & Experience:
Technical (Engineering or Computer Science) BS/MS degree or equivalent work experience
Full-time