Principal Engineer DBRE

Arcesium

Job Location:

Hyderabad - Pakistan

Monthly Salary: Not Disclosed

Posted on: 21 hours ago

Vacancies: 1 Vacancy

Job Summary

Company Overview

Arcesium is a global financial technology firm that solves complex data-driven challenges faced by some of the worlds most sophisticated financial institutions. We constantly innovate our platform and capabilities to meet tomorrows challenges anticipate the risks our clients encounter and design advanced solutions to help our clients achieve transformational business outcomes.

Financial technology is a high-growth industry as change and innovation continue to disrupt the status-quo and prompt major transformation. Arcesium is at a particularly interesting time in our own growth as we look to leverage our successfully established market position and expand operations in pursuit of strategic new business opportunities. We value intellectual curiosity proactive ownership and collaboration with colleagues and we empower you to meaningfully contribute from day one and accelerate your professional development.

We are looking for an exceptional engineer to provide expert-level technical leadership for our Database Reliability Engineering (DBRE) platform. This is a hands-on individual contributor role that owns the architectural direction for our most complex database reliability challenges - high availability disaster recovery observability and platform automation across thousands of SQL Server Aurora PostgreSQL and Snowflake environments running mission-critical workloads for the worlds most sophisticated financial institutions.

What youll do:

Drive architectural direction for the database platform across SQL Server Aurora PostgreSQL and Snowflake covering high availability disaster recovery replication backup and recovery capacity performance and security.
Own complex cross-cutting initiatives such as cross-region disaster recovery platform refresh orchestration alerting redesign and cost optimization taking each from problem statement through to a deployed owned solution.
Lead by example with exemplary code design documents RFCs and runbooks setting the standard for technical writing code quality and operational rigor across the DBRE team.
Reduce operational toil by engineering automation across provisioning refresh patching scaling failover and decommissioning treating manual operations as bugs to be eliminated.
Lead alert engineering to drive sustainable reductions in alert volume while improving signal quality partnering with application teams on alert ownership attribution and SLA design.
Drive incident response and root-cause analysis for the most complex production incidents and convert RCAs into platform-level improvements that prevent recurrence.
Define reliability KPIs (availability MTTR alert sustainability SLA adherence) and build the dashboards and reporting cadence to track them.
Partner with application engineering infrastructure and SRE teams on schema design query performance data lifecycle and shared reliability patterns and engage senior leadership on strategy multi-quarter roadmaps and budget trade-offs.

What youll need:

A bachelors or masters degree in computer science Engineering or a related field with 9 years of professional engineering experience including significant time in a principal-level or equivalent individual contributor role.
Deep hands-on expertise in at least one major relational database platform (SQL Server or PostgreSQL) including replication HA/DR architectures performance tuning query optimization and internals.
Strong working knowledge of cloud infrastructure (AWS preferred): VPC networking EC2 EBS FSx IAM RDS/Aurora and cross-region replication.
Strong programming skills in at least one of Python PowerShell Go or T-SQL capable of writing production-quality automation not just scripts.
A proven track record designing and delivering large-scale reliability initiatives (HA/DR observability automation platforms) with measurable outcomes.
Experience leading complex incident response root-cause analysis and post-incident improvement programs in 24x7 environments.
Experience with observability platforms (Datadog Prometheus Grafana) modern alerting design infrastructure-as-code (Terraform CloudFormation) and CI/CD pipelines (GitLab CI Jenkins).
Exceptional verbal and written communication skills with the ability to produce clear design documents and executive-level summaries and to influence stakeholders across engineering infrastructure and business teams.
Experience across multiple database platforms (SQL Server / PostgreSQL / Snowflake / Aurora) and familiarity with financial-services data domains is a bonus.

Arcesiums Personal Data Privacy Notice for Candidates is linked here.

Recruiting Security
Emails from genuine Arcesium recruiters who are employees of the company will always come from some cases you may also be contacted by independent search firms engaged to recruit on our behalf; emails from their employees should always come from their firms applicable domain. Well never ask for your banking information or any payment as part of the recruiting process. If something seems off or youre contacted by an unexpected third party please reach out to us at (US/UK) (India) or (Portugal/Sweden).

Arcesium is an equal opportunity employer.

Required Experience:

Staff IC

Company OverviewArcesium is a global financial technology firm that solves complex data-driven challenges faced by some of the worlds most sophisticated financial institutions. We constantly innovate our platform and capabilities to meet tomorrows challenges anticipate the risks our clients encounte...

Company Overview

What youll do:

Drive architectural direction for the database platform across SQL Server Aurora PostgreSQL and Snowflake covering high availability disaster recovery replication backup and recovery capacity performance and security.
Own complex cross-cutting initiatives such as cross-region disaster recovery platform refresh orchestration alerting redesign and cost optimization taking each from problem statement through to a deployed owned solution.
Lead by example with exemplary code design documents RFCs and runbooks setting the standard for technical writing code quality and operational rigor across the DBRE team.
Reduce operational toil by engineering automation across provisioning refresh patching scaling failover and decommissioning treating manual operations as bugs to be eliminated.
Lead alert engineering to drive sustainable reductions in alert volume while improving signal quality partnering with application teams on alert ownership attribution and SLA design.
Drive incident response and root-cause analysis for the most complex production incidents and convert RCAs into platform-level improvements that prevent recurrence.
Define reliability KPIs (availability MTTR alert sustainability SLA adherence) and build the dashboards and reporting cadence to track them.
Partner with application engineering infrastructure and SRE teams on schema design query performance data lifecycle and shared reliability patterns and engage senior leadership on strategy multi-quarter roadmaps and budget trade-offs.

What youll need:

A bachelors or masters degree in computer science Engineering or a related field with 9 years of professional engineering experience including significant time in a principal-level or equivalent individual contributor role.
Deep hands-on expertise in at least one major relational database platform (SQL Server or PostgreSQL) including replication HA/DR architectures performance tuning query optimization and internals.
Strong working knowledge of cloud infrastructure (AWS preferred): VPC networking EC2 EBS FSx IAM RDS/Aurora and cross-region replication.
Strong programming skills in at least one of Python PowerShell Go or T-SQL capable of writing production-quality automation not just scripts.
A proven track record designing and delivering large-scale reliability initiatives (HA/DR observability automation platforms) with measurable outcomes.
Experience leading complex incident response root-cause analysis and post-incident improvement programs in 24x7 environments.
Experience with observability platforms (Datadog Prometheus Grafana) modern alerting design infrastructure-as-code (Terraform CloudFormation) and CI/CD pipelines (GitLab CI Jenkins).
Exceptional verbal and written communication skills with the ability to produce clear design documents and executive-level summaries and to influence stakeholders across engineering infrastructure and business teams.
Experience across multiple database platforms (SQL Server / PostgreSQL / Snowflake / Aurora) and familiarity with financial-services data domains is a bonus.

Arcesiums Personal Data Privacy Notice for Candidates is linked here.

Arcesium is an equal opportunity employer.

Required Experience:

Staff IC

Apply Now

About Company

Arcesium

Arcesium's scalable, cloud-native solutions help clients in the investment industry transform operations with new and better financial data management.

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click