Job Title: Systems Analyst 3 (Site Reliability Engineer)
Location: Austin TX
Job Duration: 4 months
Position Overview:
We are seeking an experienced Systems Analyst with a strong focus on Site Reliability Engineering (SRE). This role involves ensuring the reliability availability performance and scalability of production systems by applying software engineering principles to infrastructure and operations.
The ideal candidate will partner with development teams to build resilient observable and automated platforms aligned with defined Service Level Objectives (SLOs).
Key Responsibilities:
- Analyze business objectives and technical requirements to propose effective solutions
- Perform cost/benefit analysis and evaluate alternative approaches
- Gather and document user requirements workflows and system processes
- Design implement and support highly available distributed systems
- Collaborate with cross-functional teams to improve system performance and reliability
- Develop detailed documentation including system designs runbooks and reports
- Monitor system performance and implement improvements for scalability and efficiency
- Lead incident response root cause analysis (RCA) and postmortem processes
- Implement monitoring alerting and logging best practices
- Ensure security and compliance are integrated into system operations
Minimum Requirements:
- 8 years of experience in Systems Engineering DevOps or Site Reliability Engineering
- Strong expertise in Linux/Unix systems and system internals
- Proficiency in programming/scripting (Python Go Java Bash)
- Experience designing and operating highly available distributed systems
- Hands-on experience with cloud platforms (AWS or GCP)
- Experience with containerization and orchestration tools (Docker Kubernetes)
- Strong knowledge of monitoring alerting and logging frameworks
- Experience defining and managing SLIs SLOs and error budgets
- Familiarity with incident management RCA and postmortem practices
- Experience integrating security and compliance into operational workflows
Preferred Qualifications:
- Experience with observability tools (Prometheus Grafana Datadog Splunk etc.)
- Experience supporting 24x7 production environments and on-call rotations
- Familiarity with chaos engineering and resiliency testing
- Experience with canary deployments feature flags and progressive delivery
- Strong documentation skills (runbooks dashboards operational standards)
For more details reach at
Required Experience:
IC
Job Title: Systems Analyst 3 (Site Reliability Engineer)Location: Austin TXJob Duration: 4 monthsPosition Overview:We are seeking an experienced Systems Analyst with a strong focus on Site Reliability Engineering (SRE). This role involves ensuring the reliability availability performance and scalabi...
Job Title: Systems Analyst 3 (Site Reliability Engineer)
Location: Austin TX
Job Duration: 4 months
Position Overview:
We are seeking an experienced Systems Analyst with a strong focus on Site Reliability Engineering (SRE). This role involves ensuring the reliability availability performance and scalability of production systems by applying software engineering principles to infrastructure and operations.
The ideal candidate will partner with development teams to build resilient observable and automated platforms aligned with defined Service Level Objectives (SLOs).
Key Responsibilities:
- Analyze business objectives and technical requirements to propose effective solutions
- Perform cost/benefit analysis and evaluate alternative approaches
- Gather and document user requirements workflows and system processes
- Design implement and support highly available distributed systems
- Collaborate with cross-functional teams to improve system performance and reliability
- Develop detailed documentation including system designs runbooks and reports
- Monitor system performance and implement improvements for scalability and efficiency
- Lead incident response root cause analysis (RCA) and postmortem processes
- Implement monitoring alerting and logging best practices
- Ensure security and compliance are integrated into system operations
Minimum Requirements:
- 8 years of experience in Systems Engineering DevOps or Site Reliability Engineering
- Strong expertise in Linux/Unix systems and system internals
- Proficiency in programming/scripting (Python Go Java Bash)
- Experience designing and operating highly available distributed systems
- Hands-on experience with cloud platforms (AWS or GCP)
- Experience with containerization and orchestration tools (Docker Kubernetes)
- Strong knowledge of monitoring alerting and logging frameworks
- Experience defining and managing SLIs SLOs and error budgets
- Familiarity with incident management RCA and postmortem practices
- Experience integrating security and compliance into operational workflows
Preferred Qualifications:
- Experience with observability tools (Prometheus Grafana Datadog Splunk etc.)
- Experience supporting 24x7 production environments and on-call rotations
- Familiarity with chaos engineering and resiliency testing
- Experience with canary deployments feature flags and progressive delivery
- Strong documentation skills (runbooks dashboards operational standards)
For more details reach at
Required Experience:
IC
View more
View less