Senior DevOps Engineer

Shangri-La Group


Job Location:

Shanghai - China

Monthly Salary: Not Disclosed
Posted on: 3 days ago
Vacancies: 1 Vacancy

Job Summary

Headquartered in Hong Kong we have over 100 hotels and resorts under four brands nested in key cities and beautiful beachfront locations globally. We are expanding rapidly with a strong development pipeline throughout Asia the Middle East Europe and Africa.

Regarded as one of the worlds finest hotel ownership and management companies Shangri-La is dedicated to delight guests around the world with legendary service finely tuned from over 45 years of hospitality from the heart. We have an affinity with Asian travelers and we offer them a gateway to the rest of the world positioning us a leading brand in luxury hospitality.

As an enviable employer with industry leading levels of colleague engagement our people are our priority. Our success is only made possible through the efforts and abilities of over 42000 colleagues accordance with this belief the focused investment we make in the learning and development of our colleagues is unparalleled in the global hospitality industry. From welcoming new colleagues to best in class leadership development you can be sure that potential is identified and nurtured throughout your career.

基础资质

  1. 计算机相关专业本科及以上学历5年及以上DevOps领域实战经验
  2. 有大型互联网项目或云原生项目全流程交付经验具备复杂问题落地解决能力
  3. 基本的英语听说能力

核心技术要求

  1. 深度精通Kubernetes架构与生态对容器网络存储调度安全隔离有深入理解具备多个从0到1项目的容器化改造集群搭建经验能够独立负责大型项目的容器化架构设计与落地交付
  2. 精通CI/CD全流程体系熟练掌握JenkinsGitLab CIArgo CD等主流工具链能够主导研发效能平台的架构设计与迭代优化
  3. 扎实掌握AI工程化能力熟悉大模型部署架构与AI工作负载特性能够搭建适配AI研发的容器化运行环境支持模型训练微调推理全流程的资源调度与运维保障具备AI开发测试运维一体化流程建设经验
  4. 能够基于大模型能力优化DevOps流程参与AI辅助研发效能工具落地如AI智能CR智能告警分析故障自动根因定位等具备AI组件模型网关Agent知识库服务对接与运维经验
  5. 具备扎实的基础设施即代码能力熟练使用TerraformAnsibleHelm等工具熟悉主流公有云阿里云/ AWS/ 腾讯云的架构设计与最佳实践
  6. 掌握至少一门编程语言Python/Go/Java优先具备大型分布式系统的故障排查性能调优与高可用架构设计经验
  7. 熟悉可观测性体系能够基于PrometheusGrafanaELK等技术搭建完善的监控告警与链路追踪体系针对AI任务特性优化监控与资源弹性扩缩容策略

软能力要求

  1. 具备极强的服务保障意识能够牵头处理重大线上故障推动建立稳定可靠的生产环境保障体系
  2. 拥有清晰的用户体验意识能够面向AI研发与业务研发团队输出易用高效的DevOps工具与流程持续优化研发交付效率与使用体验
  3. 具备优秀的技术选型与方案设计能力能够结合AI项目与业务需求平衡技术先进性与落地成本
  4. 良好的跨团队沟通与技术影响力能够带动团队技术成长推动AI赋能DevOps的文化落地

Basic Qualifications

  1. Bachelors degree or above in computer-related majors with no less than 5 years of practical experience in DevOps field
  2. Full-lifecycle delivery experience in large-scale Internet or cloud-native projects with the ability to solve complex engineering problems
  3. Basic English listening and speaking skills

Core Technical Requirements

  1. Deeply proficient in Kubernetes architecture and ecosystem with in-depth understanding of container network storage scheduling and security isolation. You are required to have experience in multiple containerization transformation and cluster construction projects from scratch and can independently be responsible for the containerization architecture design and delivery of large-scale projects.
  2. Proficient in the full CI/CD process skilled in mainstream toolchains including Jenkins GitLab CI and Argo CD and able to lead the architecture design and iterative optimization of R&D efficiency platforms
  3. Solid AI engineering capabilities familiar with large model deployment architecture and AI workload characteristics able to build containerized operating environment adapted for AI R&D support resource scheduling and operation guarantee for the whole process of model training fine-tuning and inference with experience in building integrated AI development testing and operation and maintenance processes
  4. Able to optimize DevOps process based on large model capabilities participate in the implementation of AI-assisted R&D efficiency tools (such as AI intelligent CR intelligent alarm analysis automatic root cause location of faults etc.) with experience in docking and operation of AI components (model gateway Agent knowledge base service)
  5. Solid capabilities in infrastructure as code proficient in tools such as Terraform Ansible and Helm familiar with architecture design and best practices of mainstream public clouds (Alibaba Cloud/AWS/Tencent Cloud)
  6. Master at least one programming language (Python/Go/Java preferred) with experience in troubleshooting performance tuning and high-availability architecture design for large distributed systems
  7. Familiar with observability system able to build complete monitoring alarm and distributed tracing system based on Prometheus Grafana ELK and other technologies and optimize monitoring and resource elastic scaling strategies according to the characteristics of AI tasks

Soft Skill Requirements

  1. Have a strong sense of service assurance able to lead the handling of major online failures and promote the establishment of a stable and reliable production environment assurance system
  2. Have a clear awareness of user experience able to deliver easy-to-use and efficient DevOps tools and processes for AI R&D and business R&D teams and continuously optimize R&D delivery efficiency and user experience
  3. Excellent technical selection and solution design capabilities able to balance technological advancement and implementation cost based on AI projects and business requirements
  4. Good cross-team communication and technical influence able to drive the technical growth of the team and promote the implementation of AI-enabled DevOps culture

Required Experience:

Senior IC

Headquartered in Hong Kong we have over 100 hotels and resorts under four brands nested in key cities and beautiful beachfront locations globally. We are expanding rapidly with a strong development pipeline throughout Asia the Middle East Europe and Africa.Regarded as one of the worlds finest hotel ...

About Company

Company Logo

Book direct for offers and packages at the best rate guaranteed. Join Shangri-La Circle to enjoy exclusive member rates with more flexibility and privileges.

View Profile View Profile