Job Title: Site Reliability Engineer
Location: Singapore
Job Type: Full-time
Responsibility:
- Cluster Operations & Management
- Manage and maintain container clusters (Kubernetes Docker) and open-source component clusters (Kafka Redis Elasticsearch) across multiple business units
- Ensure optimal performance scalability and reliability of distributed systems
- Infrastructure Platform Development
- Design build and enhance infrastructure operation platforms
- Develop and maintain systems for infrastructure management CI/CD pipelines monitoring/alerting and centralized logging
- Drive platform standardization and automation initiatives
- High Availability & Reliability
- Ensure maximum uptime for production services through proactive monitoring and incident response
- Continuously optimize service architecture deployment strategies and operational processes
- Implement and maintain SLA/SLO frameworks and reliability engineering practices
- Automation & Process Improvement
- Lead the development of automated operations and maintenance systems
- Create self-service tools and workflows to improve team productivity
- Establish best practices for infrastructure such as code and configuration management
Required Qualifications
- Experience & Education
- 2 years of hands-on experience in Systems Operations DevOps or Site Reliability Engineering (SRE)
- Bachelors degree in Computer Science Engineering or related technical field preferred
- Cloud & Infrastructure
- Experience with public cloud platforms (AWS Azure or GCP) is highly valued
- Strong understanding of large-scale internet architecture and distributed systems
- Proven experience with infrastructure monitoring logging and observability tools
- Technical Skills
- Proficiency in scripting and automation using Shell Python or similar languages
- Strong knowledge of containerization technologies (Kubernetes Docker)
- Hands-on experience operating production-grade container clusters and managing CI/CD pipelines
- Strong familiarity with common infrastructure components: Nginx MySQL Redis Kafka Elasticsearch
Advanced Networking (Preferred)
- Experience with Service Mesh architectures Cilium CNI and eBPF technologies
- Understanding network security load balancing and traffic management
- Knowledge of cloud-native networking patterns and best practices
Job Title: Site Reliability Engineer Location: Singapore Job Type: Full-timeResponsibility: Cluster Operations & ManagementManage and maintain container clusters (Kubernetes Docker) and open-source component clusters (Kafka Redis Elasticsearch) across multiple business unitsEnsure optimal performanc...
Job Title: Site Reliability Engineer
Location: Singapore
Job Type: Full-time
Responsibility:
- Cluster Operations & Management
- Manage and maintain container clusters (Kubernetes Docker) and open-source component clusters (Kafka Redis Elasticsearch) across multiple business units
- Ensure optimal performance scalability and reliability of distributed systems
- Infrastructure Platform Development
- Design build and enhance infrastructure operation platforms
- Develop and maintain systems for infrastructure management CI/CD pipelines monitoring/alerting and centralized logging
- Drive platform standardization and automation initiatives
- High Availability & Reliability
- Ensure maximum uptime for production services through proactive monitoring and incident response
- Continuously optimize service architecture deployment strategies and operational processes
- Implement and maintain SLA/SLO frameworks and reliability engineering practices
- Automation & Process Improvement
- Lead the development of automated operations and maintenance systems
- Create self-service tools and workflows to improve team productivity
- Establish best practices for infrastructure such as code and configuration management
Required Qualifications
- Experience & Education
- 2 years of hands-on experience in Systems Operations DevOps or Site Reliability Engineering (SRE)
- Bachelors degree in Computer Science Engineering or related technical field preferred
- Cloud & Infrastructure
- Experience with public cloud platforms (AWS Azure or GCP) is highly valued
- Strong understanding of large-scale internet architecture and distributed systems
- Proven experience with infrastructure monitoring logging and observability tools
- Technical Skills
- Proficiency in scripting and automation using Shell Python or similar languages
- Strong knowledge of containerization technologies (Kubernetes Docker)
- Hands-on experience operating production-grade container clusters and managing CI/CD pipelines
- Strong familiarity with common infrastructure components: Nginx MySQL Redis Kafka Elasticsearch
Advanced Networking (Preferred)
- Experience with Service Mesh architectures Cilium CNI and eBPF technologies
- Understanding network security load balancing and traffic management
- Knowledge of cloud-native networking patterns and best practices
View more
View less