Key Responsibilities
- Design deploy operate and scale multi-region CockroachDB clusters in production environments
- Ensure high availability fault tolerance and data consistency for globally distributed clusters
- Monitor cluster health latency replication status and resource utilization using observability tools
- Perform capacity planning and proactive scaling for future growth
- Troubleshoot complex database and infrastructure issues including:
- Node failures
- Network partitions
- Leaseholder and range imbalance
- Replication lag
- Hotspotting
- High latency / throughput bottlenecks
- Design disaster recovery strategies (multi-region backup/restore failover/fallback)
- Implement and test backup restore and point-in-time recovery processes
- Automate provisioning scaling patching and upgrades of CRDB clusters
- Perform rolling upgrades with zero or near-zero downtime
- Optimize SQL query performance and database schema efficiency
- Create operational runbooks SOPs and on-call playbooks for CRDB
- Participate in on-call rotations and incident response for production clusters
Key Responsibilities Design deploy operate and scale multi-region CockroachDB clusters in production environments Ensure high availability fault tolerance and data consistency for globally distributed clusters Monitor cluster health latency replication status and resource utilization using observab...
Key Responsibilities
- Design deploy operate and scale multi-region CockroachDB clusters in production environments
- Ensure high availability fault tolerance and data consistency for globally distributed clusters
- Monitor cluster health latency replication status and resource utilization using observability tools
- Perform capacity planning and proactive scaling for future growth
- Troubleshoot complex database and infrastructure issues including:
- Node failures
- Network partitions
- Leaseholder and range imbalance
- Replication lag
- Hotspotting
- High latency / throughput bottlenecks
- Design disaster recovery strategies (multi-region backup/restore failover/fallback)
- Implement and test backup restore and point-in-time recovery processes
- Automate provisioning scaling patching and upgrades of CRDB clusters
- Perform rolling upgrades with zero or near-zero downtime
- Optimize SQL query performance and database schema efficiency
- Create operational runbooks SOPs and on-call playbooks for CRDB
- Participate in on-call rotations and incident response for production clusters
View more
View less