Site Reliability Engineer

Second Talent

Not Interested
Bookmark
Report This Job

profile Job Location:

Singapore - Singapore

profile Monthly Salary: Not Disclosed
Posted on: Yesterday
Vacancies: 1 Vacancy

Job Summary

Job Title: Site Reliability Engineer

Location: Singapore

Job Type: Full-time


Responsibility:

  • Cluster Operations & Management
    • Manage and maintain container clusters (Kubernetes Docker) and open-source component clusters (Kafka Redis Elasticsearch) across multiple business units
    • Ensure optimal performance scalability and reliability of distributed systems
  • Infrastructure Platform Development
    • Design build and enhance infrastructure operation platforms
    • Develop and maintain systems for infrastructure management CI/CD pipelines monitoring/alerting and centralized logging
    • Drive platform standardization and automation initiatives
  • High Availability & Reliability
    • Ensure maximum uptime for production services through proactive monitoring and incident response
    • Continuously optimize service architecture deployment strategies and operational processes
    • Implement and maintain SLA/SLO frameworks and reliability engineering practices
  • Automation & Process Improvement
    • Lead the development of automated operations and maintenance systems
    • Create self-service tools and workflows to improve team productivity
    • Establish best practices for infrastructure such as code and configuration management

Required Qualifications

  • Experience & Education
    • 2 years of hands-on experience in Systems Operations DevOps or Site Reliability Engineering (SRE)
    • Bachelors degree in Computer Science Engineering or related technical field preferred
  • Cloud & Infrastructure
    • Experience with public cloud platforms (AWS Azure or GCP) is highly valued
    • Strong understanding of large-scale internet architecture and distributed systems
    • Proven experience with infrastructure monitoring logging and observability tools
  • Technical Skills
    • Proficiency in scripting and automation using Shell Python or similar languages
    • Strong knowledge of containerization technologies (Kubernetes Docker)
    • Hands-on experience operating production-grade container clusters and managing CI/CD pipelines
    • Strong familiarity with common infrastructure components: Nginx MySQL Redis Kafka Elasticsearch

Advanced Networking (Preferred)

  • Experience with Service Mesh architectures Cilium CNI and eBPF technologies
  • Understanding network security load balancing and traffic management
  • Knowledge of cloud-native networking patterns and best practices
Job Title: Site Reliability Engineer Location: Singapore Job Type: Full-timeResponsibility: Cluster Operations & ManagementManage and maintain container clusters (Kubernetes Docker) and open-source component clusters (Kafka Redis Elasticsearch) across multiple business unitsEnsure optimal performanc...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting