Site Reliability Engineer

Arvion Services

Job Location:

Kuala Lumpur - Malaysia

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Job Overview

We are seeking a Site Reliability Engineer (SRE) to support large-scale distributed and fault-tolerant systems for a global technology environment. This role combines software engineering and systems operations to improve system reliability scalability automation and performance.

What Will You Do:

Design build and maintain scalable and highly available infrastructure systems.
Develop automation tools and scripts to improve operational efficiency.
Monitor system performance and troubleshoot infrastructure issues proactively.
Implement monitoring alerting SLIs SLOs and SLA tracking.
Participate in 24/7 on-call rotations and incident response activities.
Conduct root cause analysis and support post-mortem reviews.
Collaborate with engineering and cross-functional teams on system improvements.
Ensure infrastructure security compliance and reliability best practices.
Support containerized environments using Docker and Kubernetes.

What Makes You A Good Fit:

Bachelors or Masters Degree in Computer Science IT Engineering or related field.
Minimum 3 years of experience in SRE Systems Engineering or Software Engineering.
Proficient in programming languages such as Python Go Java or C.
Strong Linux systems and networking knowledge.
Experience with Docker Kubernetes Prometheus and Grafana is preferred.
Knowledge of relational databases and system architecture.
Strong analytical troubleshooting and communication skills.

What We Offer:

Opportunity to work on large-scale global infrastructure systems.
Exposure to advanced cloud automation and reliability engineering practices.
Career growth within a dynamic technology environment.
Collaborative and fast-paced team culture.

Job Overview We are seeking a Site Reliability Engineer (SRE) to support large-scale distributed and fault-tolerant systems for a global technology environment. This role combines software engineering and systems operations to improve system reliability scalability automation and performance. What W...