Operation Engineer

Guangzhou - China

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Operation Engineer

Role summary - Were looking for a passionate Site Reliability Engineer (SRE) to join our infrastructure team. You must be a systems thinker problem-solver and automation advocate with a proven track record of building resilient scalable systems. If you thrive in bridging development and operations obsess over monitoring/metrics and geek out on turning manual processes into self-healing systems we want you.

Location: Guangzhou
On-site
Full-time

What youll do

Build maintain and optimize self-built Kubernetes platforms.
Deploy and maintain Alibaba Cloud ACK (Alibaba Cloud Container Service for Kubernetes).
Construct maintain and optimize Alibaba Cloud integrated delivery platforms.
Deploy and maintain distributed file systems such as GlusterFS.
Deploy maintain and use Jenkins CI/CD (Continuous Integration/Continuous Deployment).
Provide support for daily system launch and production.
Design build and maintain highly available and scalable distributed systems.
Real-time track system health status through monitoring alerting and automation tools (e.g. Prometheus Grafana).
Develop automated tools and scripts (e.g. Python/Shell) to reduce manual operations and improve O&M (Operations and Maintenance) efficiency.
Analyze system capacity requirements formulate capacity expansion strategies and balance costs and performance.
Identify performance bottlenecks and optimize service response time throughput and resource utilization.
Compile technical documents and O&M documents.
Learn and introduce other excellent tools.
Undertake other tasks assigned by the leadership.

Who you are

More than 5 years of experience in O&M/SRE (Site Reliability Engineering)/DevOps with practical experience in large-scale distributed systems being preferred.
Familiar with the deployment and maintenance of components related to cloud platforms such as Alibaba Cloud AWS and Azure.
Familiar with the deployment maintenance and optimization of cloud Kubernetes services such as ACK (Alibaba Cloud Container Service for Kubernetes) and AKS (Azure Kubernetes Service).
Familiar with SRE methodologies (e.g. SLI/SLO/Error Budget) and possess the ability to troubleshoot faults and optimize systems.
Proficient in cloud-based Kubernetes and self-built Kubernetes with complete project implementation experience.
Skilled in the deployment and maintenance of distributed file system components such as GlusterFS.
Familiar with monitoring and log systems (e.g. Zabbix Prometheus Grafana ELK Datadog).
Excellent logical thinking and problem-solving abilities.
Strong sense of responsibility and self-motivation in O&M work.
Strong self-learning comprehension and hands-on abilities.
Good communication skills and team collaboration skills.

#LI-DW1

#LI-Onsite

Required Experience:

Operation EngineerRole summary - Were looking for a passionate Site Reliability Engineer (SRE) to join our infrastructure team. You must be a systems thinker problem-solver and automation advocate with a proven track record of building resilient scalable systems. If you thrive in bridging developmen...

Operation Engineer

Location: Guangzhou
On-site
Full-time

What youll do

Build maintain and optimize self-built Kubernetes platforms.
Deploy and maintain Alibaba Cloud ACK (Alibaba Cloud Container Service for Kubernetes).
Construct maintain and optimize Alibaba Cloud integrated delivery platforms.
Deploy and maintain distributed file systems such as GlusterFS.
Deploy maintain and use Jenkins CI/CD (Continuous Integration/Continuous Deployment).
Provide support for daily system launch and production.
Design build and maintain highly available and scalable distributed systems.
Real-time track system health status through monitoring alerting and automation tools (e.g. Prometheus Grafana).
Develop automated tools and scripts (e.g. Python/Shell) to reduce manual operations and improve O&M (Operations and Maintenance) efficiency.
Analyze system capacity requirements formulate capacity expansion strategies and balance costs and performance.
Identify performance bottlenecks and optimize service response time throughput and resource utilization.
Compile technical documents and O&M documents.
Learn and introduce other excellent tools.
Undertake other tasks assigned by the leadership.

Who you are

More than 5 years of experience in O&M/SRE (Site Reliability Engineering)/DevOps with practical experience in large-scale distributed systems being preferred.
Familiar with the deployment and maintenance of components related to cloud platforms such as Alibaba Cloud AWS and Azure.
Familiar with the deployment maintenance and optimization of cloud Kubernetes services such as ACK (Alibaba Cloud Container Service for Kubernetes) and AKS (Azure Kubernetes Service).
Familiar with SRE methodologies (e.g. SLI/SLO/Error Budget) and possess the ability to troubleshoot faults and optimize systems.
Proficient in cloud-based Kubernetes and self-built Kubernetes with complete project implementation experience.
Skilled in the deployment and maintenance of distributed file system components such as GlusterFS.
Familiar with monitoring and log systems (e.g. Zabbix Prometheus Grafana ELK Datadog).
Excellent logical thinking and problem-solving abilities.
Strong sense of responsibility and self-motivation in O&M work.
Strong self-learning comprehension and hands-on abilities.
Good communication skills and team collaboration skills.

#LI-DW1

#LI-Onsite

Required Experience:

Key Skills

Change Management
Software Deployment
Cloud Infrastructure
High Availability
IaaS
Firewall
Linux
Middleware
Jboss
Network Architecture
Scripting
Technical Support

Apply Now

About Company

Payoneer

In today’s borderless digital world, Payoneer enables millions of businesses and professionals from more than 200 countries and territories to connect with each other and grow globally through our cross-border payments platform. With Payoneer’s fast, flexible, secure and low-cost solu ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click