SRE Manager

XREX


Job Location:

Taipei City - Taiwan

Monthly Salary: Not Disclosed
Posted on: 15 days ago
Vacancies: 1 Vacancy

Job Summary

About

Want to build a worldwide brand from Taiwan and to communicate our brand story to millions of users worldwide

Want to be based in Taiwan but work in a silicon-valley-like environment and to build world-class brand and products

Want to participate in the global fintech and blockchain movement and work at an English-speaking workplace

Come change the world with us! Join this fast-growing startup founded by software veterans and funded by top VCs Skype co-founders and the Taiwanese government (NDF)!

Were hiring for an experienced SRE Manager. The exact mix of other skills does not matter so long as your tool chest includes a mix of abilities. Be willing to attack anything that comes your way learn on the fly and get things done.

Come talk to us if you want to push your skillset in a dynamic fast-paced environment.



Responsibilities:

  • Lead and manage the SRE team to ensure high availability scalability and reliability of production systems
  • Own AWS cloud infrastructure operations monitoring security resource management and cost optimization in a 24/7 environment
  • Lead incident management troubleshooting RCA and post-incident improvements
  • Ensure infrastructure cloud environments and operational processes comply with security audit and regulatory requirements (e.g. MAS TRM ISO 27001)
  • Drive SRE best practices including observability alerting SLA/SLO/SLI capacity planning disaster recovery and high availability
  • Improve system performance reliability and operational efficiency through automation and architecture optimization
  • Build and maintain CI/CD IaC and GitOps workflows to improve deployment efficiency and system consistency
  • Manage Kubernetes / EKS platforms and containerized infrastructure
  • Collaborate closely with Backend Data Security and Product teams on architecture design and operational improvements
  • Build and improve monitoring and observability platforms such as Grafana ELK CloudWatch Zabbix and Nagios
  • Mentor team members support technical growth and drive cross-functional collaboration
  • Maintain operational documentation SOPs and incident reports
  • Participate in and improve on-call and incident response processes

Requirements:

  • 8 years of Linux system administration and large-scale infrastructure experience
  • 2 years of team management or Tech Lead experience
  • Hands-on experience operating high-traffic high-availability cloud platforms in a 24/7 environment
  • Strong experience with AWS services including:
    • EC2 API Gateway AppSync
    • VPC IAM Networking
    • Lambda Aurora ElastiCache (Redis)
    • CloudFront CloudWatch EKS
    • Security Services SNS Parameter Store Secrets Manager
  • Strong Kubernetes and container infrastructure experience including EKS administration and troubleshooting
  • Experience with Infrastructure as Code and configuration management tools such as Terraform Helm and Kustomize
  • Experience with CI/CD and GitOps tools such as Jenkins GitHub Actions Argo Workflow and ArgoCD
  • Familiar with observability and monitoring tools including Grafana ELK Zabbix and Nagios
  • Experience managing distributed systems and related technologies such as MongoDB Kafka Load Balancers and HA architecture
  • Strong understanding of SRE / DevOps practices including Incident Management Capacity Planning Disaster Recovery and SLA/SLO/SLI
  • Proficient in scripting or programming languages such as Bash Python or Golang
  • Knowledge of cloud security infrastructure security and technical risk management
  • Strong communication collaboration and problem-solving skills in fast-paced environments
  • Experience in FinTech Crypto or high-availability platforms is a plus
  • Familiar with compliance and security frameworks such as MAS TRM and ISO 27001 is a plus

    Location: Taipei (check it out on Google Maps!)

    About XREX

    Regarding our culture




    Required Experience:

    Manager

    About Want to build a worldwide brand from Taiwan and to communicate our brand story to millions of users worldwideWant to be based in Taiwan but work in a silicon-valley-like environment and to build world-class brand and productsWant to participate in the global fintech and blockchain movement and...

    About Company

    Company Logo

    專精雲端運維,卻苦無國際平台嗎?想與頂尖團隊一起用最新技術與思維,建立雲端運為平台嗎?想實際使用 AWS 上最新的各種雲端運為架構嗎?想在台灣,擁有矽谷團隊的運作方式,與國際一流人才互動嗎?XREX 誠徵:具有 DevOps 思維與技術的 資深 SRE System Reliability Engineer。 【工作內容】 1. 負責 AWS 平台日常運維,確保系統 7x24 小時穩定運行,包括系統、應用、日誌監控,元件升級與安全事件處理。 2. 分析系統瓶頸,進行架構和性能優化。 3. 建置與維護 Zabbix、Nagios 和 ELK 等監控系統,進行告警處理 ... View more

    View Profile View Profile