Overview
Were looking for a Data Reliability Engineer to help keep our trading and data platforms running this role youll be the guardian of our data pipelines ensuring that trading critical Airflow workflows and Python-based jobs run smoothly on time and with precision. Youll dive deep into incidents when they happen diagnose issues quickly and make the fixes that keep downstream systems healthy and reliable. Your work will directly support the speed and stability our trading teams depend on every day.
Youll collaborate closely with quantitative researchers data and reliabilityengineers and developers to review changes manage releases and strengthen our overall deployment practices. Beyond keeping things running youll help make them better improving monitoring alerts and recovery processes while automating wherever possible. This role is perfect for someone who thrives on ownership loves solving complex operational puzzles and is passionate about building robust high-performance data systems that never miss a beat.
In this role you will:
- Ensure Platform Reliability - Monitor and maintain trading-critical Airflow DAGs and Python-based pipelines ensuring jobs run on time and within SLAs
- Incident Response & Recovery - Triage troubleshoot and resolve failures quickly; validate downstream impacts and maintain tested rollback/recovery procedures
- Change & Release Management - Act as a release gatekeeperreview code/config changes enforce safe deployment standards and coordinate risk-aware releases via Git(lab) and Octopus Deploy
- Collaboration & Communication - Partner with quants and engineers to assess change impacts document runbooks and communicate operational updates and risks
- Continuous Improvement - Enhance monitoring alerting and automation; track KPIs and drive initiatives that strengthen platform resilience and reduce incident recurrence
What were looking for
- Degree in a technical or business discipline or equivalent industry experience of 1 years
- Demonstratedexperience with Python or equivalent language
- Excellent analytical & troubleshooting skills self-motivated and curious
- Willing to work shift hours to cover early and late responsibilities (alternating)
- Experience with Change Management Incident Management Procedures
- Experience of technical documentation & support cases
Desirable
- Financial Trade Floor experience is a plus but not essential training will be provided.
- Knowledge of system monitoring tools such as CheckMK Splunk ELK
- Awareness of modern distributed application systems and the glue that binds them - messaging and database systems
If youre a recruiting agency and want to partner with us please reach out to . Any resume or referral submitted in the absence of a signed agreement will not be eligible for an agency fee.
#LI-ED1
OverviewWere looking for a Data Reliability Engineer to help keep our trading and data platforms running this role youll be the guardian of our data pipelines ensuring that trading critical Airflow workflows and Python-based jobs run smoothly on time and with precision. Youll dive deep into incide...
Overview
Were looking for a Data Reliability Engineer to help keep our trading and data platforms running this role youll be the guardian of our data pipelines ensuring that trading critical Airflow workflows and Python-based jobs run smoothly on time and with precision. Youll dive deep into incidents when they happen diagnose issues quickly and make the fixes that keep downstream systems healthy and reliable. Your work will directly support the speed and stability our trading teams depend on every day.
Youll collaborate closely with quantitative researchers data and reliabilityengineers and developers to review changes manage releases and strengthen our overall deployment practices. Beyond keeping things running youll help make them better improving monitoring alerts and recovery processes while automating wherever possible. This role is perfect for someone who thrives on ownership loves solving complex operational puzzles and is passionate about building robust high-performance data systems that never miss a beat.
In this role you will:
- Ensure Platform Reliability - Monitor and maintain trading-critical Airflow DAGs and Python-based pipelines ensuring jobs run on time and within SLAs
- Incident Response & Recovery - Triage troubleshoot and resolve failures quickly; validate downstream impacts and maintain tested rollback/recovery procedures
- Change & Release Management - Act as a release gatekeeperreview code/config changes enforce safe deployment standards and coordinate risk-aware releases via Git(lab) and Octopus Deploy
- Collaboration & Communication - Partner with quants and engineers to assess change impacts document runbooks and communicate operational updates and risks
- Continuous Improvement - Enhance monitoring alerting and automation; track KPIs and drive initiatives that strengthen platform resilience and reduce incident recurrence
What were looking for
- Degree in a technical or business discipline or equivalent industry experience of 1 years
- Demonstratedexperience with Python or equivalent language
- Excellent analytical & troubleshooting skills self-motivated and curious
- Willing to work shift hours to cover early and late responsibilities (alternating)
- Experience with Change Management Incident Management Procedures
- Experience of technical documentation & support cases
Desirable
- Financial Trade Floor experience is a plus but not essential training will be provided.
- Knowledge of system monitoring tools such as CheckMK Splunk ELK
- Awareness of modern distributed application systems and the glue that binds them - messaging and database systems
If youre a recruiting agency and want to partner with us please reach out to . Any resume or referral submitted in the absence of a signed agreement will not be eligible for an agency fee.
#LI-ED1
View more
View less