Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailThis role requires deep expertise in database architecture systems engineering and DevOps practices. Youll collaborate with engineering operations and development teams to proactively monitor tune and optimize data systems while implementing resilience strategies to minimize risk and downtime.
Key Responsibilities:
Design deploy and maintain resilient scalable database systems (e.g. PostgreSQL MySQL SQL Server MongoDB Cassandra) in cloud or hybrid environments
Implement automation for provisioning patching backups scaling and failover using Infrastructure as Code (IaC) and CI/CD tools
Monitor performance availability and storage utilization; optimize query execution and indexing strategies
Develop and maintain disaster recovery plans replication strategies and backup validation procedures
Participate in on-call rotations and incident response to ensure 24/7 availability and performance of database infrastructure
Collaborate with DevOps and platform teams to improve observability and integrate databases into centralized monitoring and logging systems (e.g. Prometheus Grafana ELK Datadog)
Conduct root cause analysis for outages and performance degradations; implement long-term remediations
Support schema design versioning and migrations in collaboration with application developers
Ensure security and compliance of database systems including encryption access controls audit logging and regulatory requirements
Document standard operating procedures (SOPs) design patterns and system architecture
Required Qualifications:
Bachelors degree in Computer Science Engineering Information Systems or equivalent experience
2 years of experience managing production databases in high-volume or distributed systems
Strong expertise in at least two database systems (e.g. PostgreSQL MySQL SQL Server Oracle MongoDB Cassandra)
Hands-on experience with AWS GCP or Azure database offerings (e.g. RDS Aurora BigQuery Cosmos DB)
Proficiency in Linux systems administration and scripting (e.g. Bash Python)
Familiarity with Infrastructure as Code (Terraform CloudFormation Ansible)
Solid knowledge of replication sharding partitioning backup and restore techniques
Understanding of SRE principles: SLIs SLOs incident response and reliability testing
Strong troubleshooting and analytical skills in distributed and containerized environments (Docker Kubernetes)
Full Time