Job Summary:
Squarepoint is looking for a talented and highly motivated Ultra Low Latency Platform Engineer to provide solutions across Squarepoints global colocation (COLOs) estate consisting of 400 servers across 30 global candidate will be responsible for project delivery support escalations monitoring automation security documentation and capacity management for Squarepoints low latency infrastructure. This will involve collaborating with our business partners application owners clients vendors and internal teams (SRE Network Application Support and Application Development Quants etc.) to deliver end to end solutions in a timely manner.
- Manage systems efficiently at scale through standardization automation testing and in-depth monitoring
- Enforce development standards for source control testing and continuous integration for infrastructure OS patches and configuration management
- Manage a distributed compute environment and multiple petabyte-scale storage systems
- Install manage and monitor the Linux operating system (RHEL based)
- Troubleshoot complex hardware and software issues throughout the Squarepoint technology stack
- Create self-healing systems and automated recovery processes
- Respond to system incidents and participate in on-call rotations
- Conduct root cause analysis of incidents and outages
- Reduce operational toil through the development of user-driven automated workflows
- Work with business owners to regularly re-prioritize the book of work while delivering both tactical and long-term objectives
Required Qualifications:
- 5 years of experienceworking with Linux (RHEL/CentOS/Rocky preferred) in a large complex or niche environment with the following areas of focus: operations systems engineering and systems performance.
- Server Management and Support: HP SuperMicro Dell various overclock servers.
- Experience with Low latency network interfaces and kernel bypass (configuration and optimization): Solarflare with onload Mellanox with VMA.
- Experience with build and configuration management tools specifically Chef or Ansible.
- Experience with observability tools specifically Grafana and Prometheus.
- Highly motivated and a keen eye for scripting and automation in Python Ruby and Bash.
- In depth knowledge of server network stack configuration tuning and troubleshooting including TCP UDP(unicast/multicast) NTP PTP wireshark/tshark
- Critical thinking and problem-solving skills to tackle troubleshooting the unknown glitches and the obscure.
- Good understanding of trading venues such as Nasdaq LSE Euronext etc.
- Degree in Engineering Computer Science or related experience.