ABOUT YOU
We are looking for a Technical Service Operations Lead (TSO Lead) who is operationally driven collaborative analytical and a strong communicator to join our Global Technical Operations (GTO) team. The best candidate will be someone who thrives in a fast-paced highly collaborative and exceptionally dynamic setting and is excited to help coordinate incident response alongside cross-functional teams identify trends and patterns in production issues improve how we communicate with partners during incidents and drive continuous improvement through post-incident reviews.
Strong incident management experience ITIL knowledge and observability/monitoring expertise are essential along with experience in technical operations SRE or NOC environments supporting high-availability platforms (payments e-commerce SaaS or gaming). The ability to communicate clearly and effectively in English both written and verbal across technical and executive audiences will be key to your success in this role.
If youre passionate about driving operational excellence and platform reliability at scale and love ensuring the reliability and uptime of commerce and payment solutions that game developers and players depend on we would love to hear from you!
Technical Service Operations Lead (TSO Lead) Kuala Lumpur
ABOUT US
Xsolla is a global commerce company with robust tools and services to help developers solve the inherent challenges of the video game industry. From indie to AAA companies partner with Xsolla to help them fund distribute market and monetize their games. Grounded in the belief in the future of video games Xsolla is resolute in the mission to bring opportunities together and continually make new resources available to creators. Headquartered and incorporated in Los Angeles California Xsolla operates as the merchant of record and has helped over 1500 game developers to reach more players and grow their businesses around the world. With more paths to profits and ways to win developers have all the things needed to enjoy the game.
For more information visit .Responsibilities:
Serve as Incident Commander for major incidents coordinating cross-functional response teams driving investigation making escalation decisions and ensuring incidents are resolved within SLA targets.
Own all incident communications: draft and send clear timely updates to senior leadership Customer Success and partner/customer contacts throughout the incident lifecycle and manage customer-facing status page updates ().
Facilitate blameless Post-Incident Reviews (PIRs) for major incidents leading root cause identification assigning corrective actions with clear owners and deadlines and tracking them to closure.
During non-incident periods proactively analyze incident trends recurring issues and production bugs identify patterns create Problem tickets and report findings and recommendations to product and engineering teams on a regular cadence.
Enforce the incident management framework across the organization including the severity model priority matrix SLA targets escalation procedures and deployment readiness gates.
Oversee and mentor the Operations Engineer on your shift coaching on triage investigation runbook execution and documentation quality while conducting regular knowledge transfer sessions to build depth across the service portfolio.
Produce shift handoff reports and deliver regular operational reporting: incident trends KPI performance (MTTD MTTA MTTR) SLA adherence proactive detection rate and repeat incident analysis.
Audit service catalogue completeness on a regular cadence and govern JIRA Service Management workflows for incident PIR and problem management.
Cover for the Operations Engineer role during absences breaks or surge incidents. Participate in weekend on-call rotation for major incidents.
Qualifications:
6 years of experience in incident management SRE NOC leadership or technical operations in a production environment supporting high-availability high-transaction systems (payments e-commerce SaaS or gaming platforms preferred).
Proven incident management experience coordinating multi-team response making real-time escalation decisions and communicating with executive stakeholders under pressure.
Excellent written and verbal communication skills in English ability to draft clear concise executive updates at 3 AM under pressure facilitate blameless PIRs present operational metrics to senior leadership and communicate incident status to customers and partners with clarity and professionalism.
Strong ITIL foundation understanding of incident problem and change management lifecycles with practical experience implementing or operating ITIL-aligned workflows.
Technical depth across the observability stack ability to read and interpret logs traces and metrics in Datadog (or equivalent: Grafana Splunk New Relic). Understanding of APM SLOs error budgets burn-rate alerting and synthetic monitoring.
Hands-on experience with incident tooling: Datadog PagerDuty or OpsGenie JIRA or JIRA Service Management Slack and Confluence.
Analytical mindset ability to identify trends patterns and recurring issues from incident data and translate them into actionable recommendations for product and engineering teams.
Experience with SLA/SLO-driven operations where MTTD MTTA and MTTR are measured reported and improved.
Experience with or strong interest in AI/ML-assisted operations: anomaly detection alert correlation predictive alerting automated remediation or self-healing automation.
Comfort with 24x7 shift-based operations as part of a follow-the-sun model with handoff overlaps. Weekend on-call (rotating) for critical severities is required.
Nice to have:
Experience in the gaming payments or fintech industry.
Experience with customer/partner-facing incident communications and status page management.
JIRA Service Management administration experience: workflows SLA timers automation rules queues and permissions.
Familiarity with Datadog Service Catalog scorecards and SLOs especially burn-rate alerts and multi-window SLOs.
Experience building an operations function from scratch defining processes writing runbooks establishing governance cadences.
Background in Kubernetes cloud infrastructure (GCP preferred) microservices architecture or distributed systems.
ITIL certification (Foundation or higher) is a plus but not required.
RM240000 - RM300000 a year
BENEFITS
Convenient work tools
Latest Mac workplaces additional hardware to make you more effective at work
Google Chat Gmail Google Drive Confluence Jira GitLab
Professional growth
Free trainings and participation in specialized conferences
Rich knowledge exchange within the company
More perks
Health insurance (Medical dental and optical)- Employee and dependants
Flexible hours: organize your day according to your needs and sprint & teamwork demands
No dress code
Comfortable and new office environment
The duties of this position may change from time to time so the individual and organization can achieve their results. This job description is intended to describe the general level of work being performed. It is not intended to be all-inclusive. By submitting your application you consent to Xsolla conducting background checks where permitted by law after the final interview stage. All checks will comply with local regulations and your information will be handled confidentially. Xsolla KL Sdn Bhd takes your privacy very seriously and will not sell or externally distribute any data received during the hiring process. Pursuant to the Personal Data Protection Act 2010 (PDPA) Xsolla KL Sdn Bhd is mindful and committed to the protection of your personal information and your privacy. Please direct any inquiries regarding your data privacy to emailprotected.
For more vacancies: Careers Xsolla
We may use artificial intelligence (AI) tools to support parts of the hiring process such as reviewing applications analyzing resumes or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed please contact us.
ABOUT YOUWe are looking for a Technical Service Operations Lead (TSO Lead) who is operationally driven collaborative analytical and a strong communicator to join our Global Technical Operations (GTO) team. The best candidate will be someone who thrives in a fast-paced highly collaborative and except...
ABOUT YOU
We are looking for a Technical Service Operations Lead (TSO Lead) who is operationally driven collaborative analytical and a strong communicator to join our Global Technical Operations (GTO) team. The best candidate will be someone who thrives in a fast-paced highly collaborative and exceptionally dynamic setting and is excited to help coordinate incident response alongside cross-functional teams identify trends and patterns in production issues improve how we communicate with partners during incidents and drive continuous improvement through post-incident reviews.
Strong incident management experience ITIL knowledge and observability/monitoring expertise are essential along with experience in technical operations SRE or NOC environments supporting high-availability platforms (payments e-commerce SaaS or gaming). The ability to communicate clearly and effectively in English both written and verbal across technical and executive audiences will be key to your success in this role.
If youre passionate about driving operational excellence and platform reliability at scale and love ensuring the reliability and uptime of commerce and payment solutions that game developers and players depend on we would love to hear from you!
Technical Service Operations Lead (TSO Lead) Kuala Lumpur
ABOUT US
Xsolla is a global commerce company with robust tools and services to help developers solve the inherent challenges of the video game industry. From indie to AAA companies partner with Xsolla to help them fund distribute market and monetize their games. Grounded in the belief in the future of video games Xsolla is resolute in the mission to bring opportunities together and continually make new resources available to creators. Headquartered and incorporated in Los Angeles California Xsolla operates as the merchant of record and has helped over 1500 game developers to reach more players and grow their businesses around the world. With more paths to profits and ways to win developers have all the things needed to enjoy the game.
For more information visit .Responsibilities:
Serve as Incident Commander for major incidents coordinating cross-functional response teams driving investigation making escalation decisions and ensuring incidents are resolved within SLA targets.
Own all incident communications: draft and send clear timely updates to senior leadership Customer Success and partner/customer contacts throughout the incident lifecycle and manage customer-facing status page updates ().
Facilitate blameless Post-Incident Reviews (PIRs) for major incidents leading root cause identification assigning corrective actions with clear owners and deadlines and tracking them to closure.
During non-incident periods proactively analyze incident trends recurring issues and production bugs identify patterns create Problem tickets and report findings and recommendations to product and engineering teams on a regular cadence.
Enforce the incident management framework across the organization including the severity model priority matrix SLA targets escalation procedures and deployment readiness gates.
Oversee and mentor the Operations Engineer on your shift coaching on triage investigation runbook execution and documentation quality while conducting regular knowledge transfer sessions to build depth across the service portfolio.
Produce shift handoff reports and deliver regular operational reporting: incident trends KPI performance (MTTD MTTA MTTR) SLA adherence proactive detection rate and repeat incident analysis.
Audit service catalogue completeness on a regular cadence and govern JIRA Service Management workflows for incident PIR and problem management.
Cover for the Operations Engineer role during absences breaks or surge incidents. Participate in weekend on-call rotation for major incidents.
Qualifications:
6 years of experience in incident management SRE NOC leadership or technical operations in a production environment supporting high-availability high-transaction systems (payments e-commerce SaaS or gaming platforms preferred).
Proven incident management experience coordinating multi-team response making real-time escalation decisions and communicating with executive stakeholders under pressure.
Excellent written and verbal communication skills in English ability to draft clear concise executive updates at 3 AM under pressure facilitate blameless PIRs present operational metrics to senior leadership and communicate incident status to customers and partners with clarity and professionalism.
Strong ITIL foundation understanding of incident problem and change management lifecycles with practical experience implementing or operating ITIL-aligned workflows.
Technical depth across the observability stack ability to read and interpret logs traces and metrics in Datadog (or equivalent: Grafana Splunk New Relic). Understanding of APM SLOs error budgets burn-rate alerting and synthetic monitoring.
Hands-on experience with incident tooling: Datadog PagerDuty or OpsGenie JIRA or JIRA Service Management Slack and Confluence.
Analytical mindset ability to identify trends patterns and recurring issues from incident data and translate them into actionable recommendations for product and engineering teams.
Experience with SLA/SLO-driven operations where MTTD MTTA and MTTR are measured reported and improved.
Experience with or strong interest in AI/ML-assisted operations: anomaly detection alert correlation predictive alerting automated remediation or self-healing automation.
Comfort with 24x7 shift-based operations as part of a follow-the-sun model with handoff overlaps. Weekend on-call (rotating) for critical severities is required.
Nice to have:
Experience in the gaming payments or fintech industry.
Experience with customer/partner-facing incident communications and status page management.
JIRA Service Management administration experience: workflows SLA timers automation rules queues and permissions.
Familiarity with Datadog Service Catalog scorecards and SLOs especially burn-rate alerts and multi-window SLOs.
Experience building an operations function from scratch defining processes writing runbooks establishing governance cadences.
Background in Kubernetes cloud infrastructure (GCP preferred) microservices architecture or distributed systems.
ITIL certification (Foundation or higher) is a plus but not required.
RM240000 - RM300000 a year
BENEFITS
Convenient work tools
Latest Mac workplaces additional hardware to make you more effective at work
Google Chat Gmail Google Drive Confluence Jira GitLab
Professional growth
Free trainings and participation in specialized conferences
Rich knowledge exchange within the company
More perks
Health insurance (Medical dental and optical)- Employee and dependants
Flexible hours: organize your day according to your needs and sprint & teamwork demands
No dress code
Comfortable and new office environment
The duties of this position may change from time to time so the individual and organization can achieve their results. This job description is intended to describe the general level of work being performed. It is not intended to be all-inclusive. By submitting your application you consent to Xsolla conducting background checks where permitted by law after the final interview stage. All checks will comply with local regulations and your information will be handled confidentially. Xsolla KL Sdn Bhd takes your privacy very seriously and will not sell or externally distribute any data received during the hiring process. Pursuant to the Personal Data Protection Act 2010 (PDPA) Xsolla KL Sdn Bhd is mindful and committed to the protection of your personal information and your privacy. Please direct any inquiries regarding your data privacy to emailprotected.
For more vacancies: Careers Xsolla
We may use artificial intelligence (AI) tools to support parts of the hiring process such as reviewing applications analyzing resumes or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed please contact us.
View more
View less