Role Overview:
We are seeking a Senior MemSQL / SingleStore Cluster Administrator to own and manage mission-critical large-scale distributed database platforms. This role requires a pure Database Administrator (DBA) with deep expertise in handling petabyte-scale data complex distributed clusters and real-time latency-sensitive workloads.
Core Technical Expectations
- Experience handling petabytes of data ingested every 15 minutes in large-scale environments.
- Strong expertise managing large MemSQL / SingleStore clusters (multi-node multi-TB to multi-PB).
- Deep understanding of data distribution across aggregators and leaf nodes.
Expertise in:
- Partitioning and shard key strategy
- Data skew mitigation
- Hot partition resolution
- Worker node and leaf node optimization
Strong table-level knowledge including:
- Index strategy
- Thread management
- Connection pooling
- Memory limits
- Query plan optimization
- Strong understanding of different MemSQL/SingleStore versions and corresponding architectural/feature changes.
Key Responsibilities
- End-to-end ownership of large MemSQL/SingleStore clusters (design build upgrade operate decommission).
- Architect and maintain High Availability (HA) and Disaster Recovery (DR) setups including:
- Redundancy levels
- Availability groups
- Cross-region replication
Plan and execute:
- Cluster expansion
- Downsizing
- Online partition rebalancing
- Leaf node management with minimal/no downtime
- Proactively monitor cluster health throughput latency and capacity; define and maintain SLAs.
Perform advanced performance tuning:
- Schema design
- Shard key design
- Index strategy
- NUMA and memory tuning
- Workload management
- Implement backup/restore strategies and regularly test DR & failover.
- Lead incident response and perform deep root cause analysis.
Enforce database security best practices:
- Authentication & authorization
- Encryption
- Auditing
- Network controls
- Drive automation using scripting (Python/Bash) and Infrastructure as Code.
- Maintain documentation operational runbooks and standards.
- Evaluate new MemSQL/SingleStore features and lead version upgrades and migrations.
Required Experience & Skills
- 10 years of total database engineering/administration experience.
- 4 5 years of deep production-grade experience administering MemSQL/SingleStore clusters at scale.
Strong hands-on experience with:
- Aggregators & leaf nodes
- Licensing and memory limits
- Cluster expansion & partition rebalancing
- Replication & failover/failback
Proven ability to diagnose:
- Locking issues
- Data skew
- Hot partitions
- Bad execution plans
Strong Linux system tuning knowledge:
- CPU/NUMA affinity
- Disk & I/O optimization
- Networking
- Ulimits & OS-level tuning
Experience with monitoring & alerting tools:
- Prometheus / Grafana
- Datadog
- Splunk
- ELK
- Strong SQL expertise and scripting (Python/Bash).
- Experience in Cloud/Container environments (AWS/Azure/GCP Kubernetes) is highly preferred.
Excellent communication skills with ability to lead production calls and explain technical trade-offs clearly.
Role Overview: We are seeking a Senior MemSQL / SingleStore Cluster Administrator to own and manage mission-critical large-scale distributed database platforms. This role requires a pure Database Administrator (DBA) with deep expertise in handling petabyte-scale data complex distributed clusters and...
Role Overview:
We are seeking a Senior MemSQL / SingleStore Cluster Administrator to own and manage mission-critical large-scale distributed database platforms. This role requires a pure Database Administrator (DBA) with deep expertise in handling petabyte-scale data complex distributed clusters and real-time latency-sensitive workloads.
Core Technical Expectations
- Experience handling petabytes of data ingested every 15 minutes in large-scale environments.
- Strong expertise managing large MemSQL / SingleStore clusters (multi-node multi-TB to multi-PB).
- Deep understanding of data distribution across aggregators and leaf nodes.
Expertise in:
- Partitioning and shard key strategy
- Data skew mitigation
- Hot partition resolution
- Worker node and leaf node optimization
Strong table-level knowledge including:
- Index strategy
- Thread management
- Connection pooling
- Memory limits
- Query plan optimization
- Strong understanding of different MemSQL/SingleStore versions and corresponding architectural/feature changes.
Key Responsibilities
- End-to-end ownership of large MemSQL/SingleStore clusters (design build upgrade operate decommission).
- Architect and maintain High Availability (HA) and Disaster Recovery (DR) setups including:
- Redundancy levels
- Availability groups
- Cross-region replication
Plan and execute:
- Cluster expansion
- Downsizing
- Online partition rebalancing
- Leaf node management with minimal/no downtime
- Proactively monitor cluster health throughput latency and capacity; define and maintain SLAs.
Perform advanced performance tuning:
- Schema design
- Shard key design
- Index strategy
- NUMA and memory tuning
- Workload management
- Implement backup/restore strategies and regularly test DR & failover.
- Lead incident response and perform deep root cause analysis.
Enforce database security best practices:
- Authentication & authorization
- Encryption
- Auditing
- Network controls
- Drive automation using scripting (Python/Bash) and Infrastructure as Code.
- Maintain documentation operational runbooks and standards.
- Evaluate new MemSQL/SingleStore features and lead version upgrades and migrations.
Required Experience & Skills
- 10 years of total database engineering/administration experience.
- 4 5 years of deep production-grade experience administering MemSQL/SingleStore clusters at scale.
Strong hands-on experience with:
- Aggregators & leaf nodes
- Licensing and memory limits
- Cluster expansion & partition rebalancing
- Replication & failover/failback
Proven ability to diagnose:
- Locking issues
- Data skew
- Hot partitions
- Bad execution plans
Strong Linux system tuning knowledge:
- CPU/NUMA affinity
- Disk & I/O optimization
- Networking
- Ulimits & OS-level tuning
Experience with monitoring & alerting tools:
- Prometheus / Grafana
- Datadog
- Splunk
- ELK
- Strong SQL expertise and scripting (Python/Bash).
- Experience in Cloud/Container environments (AWS/Azure/GCP Kubernetes) is highly preferred.
Excellent communication skills with ability to lead production calls and explain technical trade-offs clearly.
View more
View less