Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailJob Description
Tackle applications services under responsible areas to ensure BAU stabilization and meet expected incident SLA and system availability level defined per on/off peak time/period and be able to apply workaround solutions by modifying current source code.
Performs root cause analysis (RCA) by doing deep code analysis to immediate troubleshoot issues and perform issue resolution (short term. Medium term and long term) within incident SLA along with proactive/reactive action.
Perform BAU system set up bug fixing & small CRs with IT implementation methodology (build test deploy) aligned to company security and business objectives and strategy.
Manage regular system patch upgrade with product owner & business stakeholders.
Manage monitoring tools by creating scripts robot or AI and ensure no business disruption.
Manage support workbook and control. Ensure knowledge base has been well organized and keep uptodate.
Be familiar with REST API (Syncronous Process) Message Producer/Consumer Process (Async Process) and Batch process.
Be familiar withe of Opensource Monitoring Tools such as ELK stack Grafarna
Be familiar with Container Technology such as Docker K8S
Be familiar with Cloud Technolgy such AWS Azure and Tencent cloud.
Qualification
Bachelors in Computer Science or related field
13 years in SRE or Support Engneer
Strong in programming (Java Go)basic SQL Linux/Unix Scripting Cloud platforms (AWS Azure Tencent Cloud).
Handson experience with Docker Kubernetes( K8S) including deploying scaling and troubleshooting.
Skilled in diagnosing and resolving issues quickly with experience in root cause analysis and incident response.
Knowledge of SLAs SLOs and automation to improve system reliability and reduce manual intervention.
Good English proficiency
Full Time