Observability Platform Engineer
Job Summary
WHO WE ARE
Optiver is a tech-driven trading firm and leading global market maker. For over 35 years Optiver has been improving financial markets worldwide making them more transparent and efficient for all participants. With more than 1400 employees in offices around the world were united in our commitment to improving the market through competitive pricing execution and thorough risk management. By providing liquidity on multiple exchanges across the world we actively trade on 70 exchanges where were trusted to always provide accurate buy and sell pricing no matter the market conditions.
WHAT YOULL DO
We are looking for a Senior Observability Platform Engineer to help evolve observability as a business-critical platform capability at Optiver. You will work on the shared platform behind metrics logs traces events alerts dashboards diagnostics instrumentation and service health.
This is a platform engineering role for someone who enjoys building reliable systems used by other engineers. You will help turn a capable but heterogeneous observability foundation into a globally consistent regionally federated platform that is reliable at scale easy to adopt and deeply embedded in how Optiver builds and operates production systems.
As a Senior Observability Platform Engineer you will design build and operate components that help engineers operators trading teams automated systems and future agent-based workflows collect query understand and act on production signals. You will work across platform and production domains: building high-scale telemetry pipelines improving instrumentation quality creating golden paths for adoption and making observability more useful during real production investigations.
In this role you will:
Design build and operate components of Optivers shared observability platform across telemetry collection ingestion storage query visualisation alerting diagnostics and service health.
Build software services APIs integrations libraries dashboards automation and reusable patterns that make observability easier to adopt and more reliable to operate.
Improve the scalability reliability performance cost-effectiveness and operational quality of high-volume telemetry systems.
Improve developer and operator experience through self-service workflows golden paths documentation investigation tooling and practical platform abstractions.
Work with engineering infrastructure trading systems research and regional operations teams to understand production debugging needs and improve observability adoption.
Own the reliability and operational quality of the components you build including service health failure modes monitoring incident learnings and continuous improvement.
Raise the standard for telemetry quality instrumentation alerting dashboards diagnostic workflows and service health across Optiver.
WHAT YOULL BRING
You are a strong engineer with experience in production systems platform engineering SRE infrastructure observability or distributed systems. You are comfortable working on systems that need to be reliable scalable understandable and useful to other engineers.
You understand that observability is not just a tooling problem. It is about signal quality platform reliability developer experience production workflows and adoption. You care about building systems that engineers trust operators can rely on and production teams can depend on during high-pressure situations.
You will bring:
Strong engineering experience in SRE software engineering platform engineering infrastructure observability developer tooling or distributed systems.
A production mindset with the ability to reason about failure modes debugging workflows service reliability operational impact and how systems behave under pressure.
Technical understanding of modern observability practices across logs metrics traces events alerting dashboards telemetry pipelines diagnostics instrumentation quality and service health.
Experience designing building or operating reliable services platforms pipelines tools or automation used by other engineering teams.
Good judgement in technical trade-offs across performance scalability reliability complexity cost and maintainability.
A delivery mindset with the ability to take ambiguous platform problems and turn them into practical reliable solutions.
Strong preference will be given to candidates with experience on observability SRE infrastructure platform production engineering or developer tooling teams in large-scale distributed systems environments including telemetry pipelines streaming systems time-series data log platforms query systems alerting systems or production diagnostics tooling.
Experience with technologies such as Kafka Grafana ELK/OpenSearch ClickHouse VictoriaMetrics InfluxDB Telegraf Vector OpenTelemetry Prometheus-style systems or custom telemetry collectors is valued.
WHAT YOULL GET
A performance-based bonus structureunmatchedanywhere in the industry. We combine our profits across desks teams and offices into a global profit pool fostering a truly collaborative environment.
The chance to work alongside diverse and intelligent peers in a rewardingenvironment.
Training mentorship and personal development opportunities.
Daily breakfast lunch and an in-house barista.
Gym membership plus weekly in-house chair massages.
Regular social events including a company trip every two years.
Guided relocation a competitive relocation package and visa sponsorship where necessary.
DIVERSITY STATEMENT
Optiver is committed todiversity and inclusion. We encourage applications from candidates of all backgrounds and welcome requests for reasonable adjustments during the process.
Questions Get in touch with the recruitment team at.
Required Experience:
IC