We are looking for a skilled Site Reliability Engineer to join our clients global SRE Team in Singapore.
Responsibilities:
- Overseeing and ensuring the continuous operation of the firms Linux based trading infrastructure addressing day to day operational needs
- Providing second level support including:
- Rapid response to emergencies
- Implementing scheduled updates and deployments
- In depth analysis and resolution of performance issues
- Engage in a rotational on call schedule including early morning and weekend shifts to provide timely support
- Contributing towards the development of automated solutions for server provisioning configuration and monitoring targeting a scalable management of thousands of servers
- Engaging in interactions with the Trading and Core Engineering teams
- Managing essential Core services such as DHCP LDAP DNS and NFS for on prem and hosted data centers as well as public clouds
- Participating in an on call rotation and occasional weekend shifts
Qualifications:
- Sound expertise in Linux production environments
- Basic knowledge of Python and Bash scripting
- Engagement with automation and monitoring tool sets
- Comprehensive knowledge of operating system principles with a particular focus on Linux internals
- Familiarity with Intel based server hardware and components
- Competence in server side networking including understanding network protocols and configurations
- Familiarity in cloud services and architectural solutions
- Experience in designing building and troubleshooting complex systems
- Good problem solving skills underpinned by a methodical approach to technical challenges. This includes an ability to communicate effectively demonstrating strong interpersonal skills a sense of responsibility and a commitment to driving projects to completion.
- Sense of ownership and drive