** Note: You must be able to commute to our Cottonwood Heights UT or Jacksonville FL as needed. We will prioritize those closer to these offices **
Senior Site Reliability Engineer
At SoFi were passionate about building resilient infrastructure that maximizes employee productivity.
Our Site Reliability Engineers (SREs) play a critical role in empowering our internal systems and services through observability and automation enabling high availability outstanding performance and seamless user experiences.
As we expand our observability and automation efforts were seeking an experienced SRE to help evolve our SRE team toward bestinclass standards. This person will focus on automating toilheavy workloads optimizing network administration across multiple offices and collaborating closely with crossfunctional DevOps and operations teams.
Objectives of this role
- Observe and monitor the corporate production environment to conceptualize and assess holistic system health.
- Automate infrastructure around corporate services and applications to reduce manual effort for engineers and end users.
- Develop and manage SRE tools using our CI/CD infrastructure.
- Define and enforce standards that maintain high availability and deep observability across DevOps and operations teams.
Implement measurementdriven SLA SLO and SLI strategies to proactively address areas of improvement and drive innovation. - Provide escalation support for multisite office networking footprints and cloudbased distributed applications.
- Advance corporate office networking toward a zerotouch provisioning model.
- Play a key role in building mentoring and evolving the SRE team toward industry best practices.
Responsibilities of this role
- Gather and analyze metrics from operating systems network devices cloud components and applications for performance tuning and troubleshooting.
- Partner with DevOps teams to enhance services through rigorous testing and improved release procedures.
- Contribute to DevOps service design platform management and capacity planning.
- Identify systems that would benefit from automation and deliver projects to systematically remove toil.
- Balance feature development speed with system reliability aligned with welldefined servicelevel objectives (SLOs).
- Ensure standardization and consistency of the network hardware footprint across all SoFi office locations.
- Streamline audit compliance activities by automating auditor access to required data and proofs.
- Lead initiatives to continuously evolve the SRE function and mentor team members.
Required skills and qualifications
- Bachelors degree (or equivalent experience) in Computer Science or a related discipline.
- 5 years of proven experience in SRE roles.
- 3 years of seniorlevel experience in onpremises and cloudbased network engineering (routing/switching).
- Strong programming skills in one or more highlevel languages: Python and Java are preferred but open to C/C Ruby or JavaScript.
- Practical experience managing infrastructure as code in cloudbased environments is essential. Familiarity with the following technologies in our stack is highly preferred:
- Terraform
- GitLab CI/CD
- AWS Cloud Networking / CloudWatch
- Datadog
- Panorama / Palo Alto Networks
- Cisco Systems
- Proactive mindset toward identifying service issues bottlenecks and delivering performance improvements.
Favorable skills and qualifications
- Strong interpersonal skills and a mentoring mindset.
- Fluency in English; competency in Spanish is a plus.
- Experience with:
- Agile sprint and project management methodologies
- Jira and Confluence administration
- Linux Windows and macOS system administration
- Ability to occasionally commute to one of our office locations listed above.
Required Experience:
Senior IC