The role:
As part of the SoFi Incident Management Enablement team you will be responsible for delivering and maintaining critical systems to facilitate response documentation and reporting of incidents and driving standardization and adoption of tooling or processes to improve detection/mitigation of impact for SoFi services and customer experience.
The role needs experienced engineers with excellent crossfunctional and communication skills as well as backgrounds in operations software development systems performance and resilience engineering. While always focusing on the quality experience of our customers.
What youll do:
- Build the future of the SoFi Incident Management platform with designs treating infrastructure as code
- Deploy and improve instrumentation for monitoring and logging the health and availability of services
- Configure and maintain software and system management tools
- Proactively ensure the highest levels of system availability
- Develop automation to improve operational efficiencies and system integration
- Partner with engineering teams in the design of systems and infrastructure
- Provide 2nd and 3rd level support
- Liaise with vendors and other IT personnel for problemresolution
- Participate in deep technical design discussions within your team and across partner teams to ensure that were building the right systems and keeping the quality high.
What youll need:
- 5 years of experience as an SRE and or/software development engineer
- Strong understanding of SRE principles
- BS/MS degree in Computer Science Engineering or a related subject or relevant experience
- Strong experience building analyzing and troubleshooting scalable distributed platforms.
- Proven working experience in installing configuring and troubleshooting applications running on UNIX /Linuxbased environments.
- Experience with automation software (e.g. Terraform Puppet cfengine Chef Ansible)
- Experience in the design and implementation of technical solutions to satisfy functional and nonfunctional requirements while ensuring quality scalability and timely delivery
- Demonstrated proficiency in writing complex scripts in a standard scripting language (e.g. Go Ruby Python)
- Solid networking knowledge (OSI network layers TCP/IP)
- Experience leading crossorganizational efforts with different teams to identify operational challenges and implement solutions
- Experience operating in the Cloud preferably in AWS
Required Experience:
Senior IC