Location: Singapore Singapore
Thales is a global technology leader trusted by governments institutions and enterprises to tackle their most demanding challenges. From quantum applications and artificial intelligence to cybersecurity and 6G innovation our solutions empower critical decisions rooted in human intelligence. Operating at the forefront of aerospace and space cybersecurity and digital identity were driven by a mission to build a future we can all trust.
In Singapore Thales has been a trusted partner since 1973 originally focused on aerospace activities in the Asia-Pacific region. With 2000 employees across three local sites we deliver cutting-edge solutions across aerospace (including air traffic management) defence and security and digital identity and cybersecurity sectors. Together were shaping the future by enabling customers to make pivotal decisions that safeguard communities and power progress.
KEY ACTIVITIES AND RESPONSIBILITIES
As a Level 2 Engineer you are accountable for:
Operational Support
- Lead and coordinate level 2 support operations for mission-critical applications and infrastructure
- Provide troubleshooting and diagnostics for incidents escalated from level 1
- Ensure adherence to SLA system availability
Incident & Problem Management
- Act as incident manager for P1/P2 issues
- Coordinate resolution and communications
- Perform root cause analysis and recommend permanent fixes
- Escalate unresolved issues that required software coding to Level 3 or engineering teams
Change Management
- Perform operational impact assessment
- Part of the CAB to review and approve change
- Pre-Change Preparation such as review Change Request and Release Plan
- Supervise post-change production verification
- Documentation update and knowledge transfer
- Post change review and feedback
Patch Management
- Perform patch management readiness
- Stakeholder coordination and team coordination
- System Readiness and Post-Patch Validation
- Documentation update and knowledge transfer
- Compliance and audit readiness
Documentation and Compliance
- Operational documentation. SOPs Incident response checklist RCA PIR monitoring and alert guidebook
- Configuration & Infrastructure Documentation. System configuration baseline application dependency maps environment inventories such as hosts services accounts
- Knowledge Base Articles for level 2 enablement and faster resolution e.g. Known Errors and Fixes Frequent How-To Guides Script Repositories Lessons Learned
- Knowledge Management
Configuration Management
- Perform validation and accuracy of configurations
- Maintain readiness of operational documentation
- Perform audit to confirm compliance of configurations
- CMDB asset verification
- Change-linked configuration tracking
- Ensure environment consistency between DEV IVVQ ISO-PROD UAT and PROD
Testing and Verification
- Ensure operational readiness testing before production deployment rollout
- Ensure post-change verification coordination
- Perform regression and sanity test following patching or upgrades in UAT and PROD
- Participation in user acceptance testing
Knowledge Management
- Documentation of resolution
- Knowledge Base Contribution
- Validation of knowledge
- Subject Matter Expertise Sharing
Root Cause Analysis
- Gather logs system metrics at the time of failure
- Reproduction of issues in a controlled environment to understand the conditions under which it occurs
- Determine the scope and severity in terms of the systems affected downtime duration and business impact
- Narrow down the possible sources of causing the failure
- Use of diagnostic tools such to analyse the application behaviour
- Correlation of events to sequence the chain of events leading up to the failure and identify the dependencies
KAST (Kubernetes Analytics Stack)
- THALES proprietary Kubernetes-based platform that provides a foundational digital infrastructure across Thales business domain
Kubernetes
- Kubernetes is an open-source platform developed by Google for automating the deployment scaling and management of containerized applications (typically Docker containers).
Docker
- Docker Compose is a tool for defining and running multi-container Docker applications using a single configuration file (). It allows you to define manage and run multiple interconnected Docker containers as a single service stack.
Kafka
- Apache Kafka is a high-performance distributed streaming platform used for building real-time data pipelines stream processing and event-driven architectures.
EMQX
- EMQX is an MQTT broker that acts as a message middleware between publishers (e.g. sensors devices) and subscribers (e.g. apps dashboards databases) using the MQTT protocol which is a lightweight publish-subscribe messaging protocol ideal for low-bandwidth high-latency or constrained devices.
Elasticsearch
- Elasticsearch is a distributed open-source search and analytics engine built on top of Apache Lucene. It is widely used for full-text search log and event data analysis and real-time data exploration.
MinIO
- MinIO is a high-performance distributed object storage system that stores data as objects (like files images videos backups) in buckets
Zookeeper
- Apache ZooKeeper is an open-source coordination service for distributed applications. It provides a highly reliable consistent and available mechanism to store metadata configuration and state information. It complements Apache Kafka by acting as a metadata management and coordination layer in Kafkas traditional architecture. ZooKeeper ensures reliability consistency and fault-tolerance in Kafkas distributed setup.
Sparks
- Apache Spark is an open-source distributed computing system designed for fast large-scale data processing. It was built for performance especially for iterative algorithms in data science and machine learning.
RHEL
- RHEL is a certified Linux operating system optimized for reliability scalability and security in business and production environments.
Ansible
- Ansible is an open-source IT automation tool developed by Red Hat that simplifies the management of servers applications and infrastructure. It allows DevOps and system administrators to automate tasks such as configuration management software deployment and orchestration. It uses simple human-readable YAML files (called playbooks) and SSH
Prometheus
- Open-source monitoring and alerting toolkit that is used to collect store and query metrics for the monitoring of infrastructure services containers and microservices
Grafana
- Open-source analytics and visualization platform used for monitoring observability and alerting. Commonly used with Prometheus
KEY KNOWLEDGE AND EXPERIENCE
To be successful in your role you will have demonstrated and/or acquired the following knowledge and experience:
Education and Experience
- Bachelor Degree in Information Technology Computer Science Engineering or a closely related discipline
- At least 5 years in Level 2 support for mission critical 24x7 production support preferably in public sector
- At least 2 years in a team lead or supervisory role coordinating tasks and mentoring junior engineers
- Proven experience in handling P1/P2 incidents managing post-incident reviews (PIRs) and root cause analysis
- Preferably certification in Red Hat Enterprise Linux or Kubernetes
Knowledge / Skills
- Operating Systems. RHEL (90%) and Windows Server (10%)
- Networking Fundamentals
- Middleware & Infrastructure (Web Server Nginx App Servers Kubernetes with containers (Docker Spring Boot)
- Message Queues (IBM MQ Kafka)
- Database (SQL Server PostgreSQL)
- ITIL/ITSM Process Knowledge
- Security Awareness
- DR and HA concepts
- Strong Technical Skills
- Leadership & Coordination
- Communication & Collaboration
- Operational Governance
At Thales were committed to fostering a workplace where respect trust collaboration and passion drive everything we do. Here youll feel empowered to bring your best self thrive in a supportive culture and love the work you do. Join us and be part of a team reimagining technology to create solutions that truly make a difference for a safer greener and more inclusive world.
Location: Singapore SingaporeThales is a global technology leader trusted by governments institutions and enterprises to tackle their most demanding challenges. From quantum applications and artificial intelligence to cybersecurity and 6G innovation our solutions empower critical decisions rooted in...
Location: Singapore Singapore
Thales is a global technology leader trusted by governments institutions and enterprises to tackle their most demanding challenges. From quantum applications and artificial intelligence to cybersecurity and 6G innovation our solutions empower critical decisions rooted in human intelligence. Operating at the forefront of aerospace and space cybersecurity and digital identity were driven by a mission to build a future we can all trust.
In Singapore Thales has been a trusted partner since 1973 originally focused on aerospace activities in the Asia-Pacific region. With 2000 employees across three local sites we deliver cutting-edge solutions across aerospace (including air traffic management) defence and security and digital identity and cybersecurity sectors. Together were shaping the future by enabling customers to make pivotal decisions that safeguard communities and power progress.
KEY ACTIVITIES AND RESPONSIBILITIES
As a Level 2 Engineer you are accountable for:
Operational Support
- Lead and coordinate level 2 support operations for mission-critical applications and infrastructure
- Provide troubleshooting and diagnostics for incidents escalated from level 1
- Ensure adherence to SLA system availability
Incident & Problem Management
- Act as incident manager for P1/P2 issues
- Coordinate resolution and communications
- Perform root cause analysis and recommend permanent fixes
- Escalate unresolved issues that required software coding to Level 3 or engineering teams
Change Management
- Perform operational impact assessment
- Part of the CAB to review and approve change
- Pre-Change Preparation such as review Change Request and Release Plan
- Supervise post-change production verification
- Documentation update and knowledge transfer
- Post change review and feedback
Patch Management
- Perform patch management readiness
- Stakeholder coordination and team coordination
- System Readiness and Post-Patch Validation
- Documentation update and knowledge transfer
- Compliance and audit readiness
Documentation and Compliance
- Operational documentation. SOPs Incident response checklist RCA PIR monitoring and alert guidebook
- Configuration & Infrastructure Documentation. System configuration baseline application dependency maps environment inventories such as hosts services accounts
- Knowledge Base Articles for level 2 enablement and faster resolution e.g. Known Errors and Fixes Frequent How-To Guides Script Repositories Lessons Learned
- Knowledge Management
Configuration Management
- Perform validation and accuracy of configurations
- Maintain readiness of operational documentation
- Perform audit to confirm compliance of configurations
- CMDB asset verification
- Change-linked configuration tracking
- Ensure environment consistency between DEV IVVQ ISO-PROD UAT and PROD
Testing and Verification
- Ensure operational readiness testing before production deployment rollout
- Ensure post-change verification coordination
- Perform regression and sanity test following patching or upgrades in UAT and PROD
- Participation in user acceptance testing
Knowledge Management
- Documentation of resolution
- Knowledge Base Contribution
- Validation of knowledge
- Subject Matter Expertise Sharing
Root Cause Analysis
- Gather logs system metrics at the time of failure
- Reproduction of issues in a controlled environment to understand the conditions under which it occurs
- Determine the scope and severity in terms of the systems affected downtime duration and business impact
- Narrow down the possible sources of causing the failure
- Use of diagnostic tools such to analyse the application behaviour
- Correlation of events to sequence the chain of events leading up to the failure and identify the dependencies
KAST (Kubernetes Analytics Stack)
- THALES proprietary Kubernetes-based platform that provides a foundational digital infrastructure across Thales business domain
Kubernetes
- Kubernetes is an open-source platform developed by Google for automating the deployment scaling and management of containerized applications (typically Docker containers).
Docker
- Docker Compose is a tool for defining and running multi-container Docker applications using a single configuration file (). It allows you to define manage and run multiple interconnected Docker containers as a single service stack.
Kafka
- Apache Kafka is a high-performance distributed streaming platform used for building real-time data pipelines stream processing and event-driven architectures.
EMQX
- EMQX is an MQTT broker that acts as a message middleware between publishers (e.g. sensors devices) and subscribers (e.g. apps dashboards databases) using the MQTT protocol which is a lightweight publish-subscribe messaging protocol ideal for low-bandwidth high-latency or constrained devices.
Elasticsearch
- Elasticsearch is a distributed open-source search and analytics engine built on top of Apache Lucene. It is widely used for full-text search log and event data analysis and real-time data exploration.
MinIO
- MinIO is a high-performance distributed object storage system that stores data as objects (like files images videos backups) in buckets
Zookeeper
- Apache ZooKeeper is an open-source coordination service for distributed applications. It provides a highly reliable consistent and available mechanism to store metadata configuration and state information. It complements Apache Kafka by acting as a metadata management and coordination layer in Kafkas traditional architecture. ZooKeeper ensures reliability consistency and fault-tolerance in Kafkas distributed setup.
Sparks
- Apache Spark is an open-source distributed computing system designed for fast large-scale data processing. It was built for performance especially for iterative algorithms in data science and machine learning.
RHEL
- RHEL is a certified Linux operating system optimized for reliability scalability and security in business and production environments.
Ansible
- Ansible is an open-source IT automation tool developed by Red Hat that simplifies the management of servers applications and infrastructure. It allows DevOps and system administrators to automate tasks such as configuration management software deployment and orchestration. It uses simple human-readable YAML files (called playbooks) and SSH
Prometheus
- Open-source monitoring and alerting toolkit that is used to collect store and query metrics for the monitoring of infrastructure services containers and microservices
Grafana
- Open-source analytics and visualization platform used for monitoring observability and alerting. Commonly used with Prometheus
KEY KNOWLEDGE AND EXPERIENCE
To be successful in your role you will have demonstrated and/or acquired the following knowledge and experience:
Education and Experience
- Bachelor Degree in Information Technology Computer Science Engineering or a closely related discipline
- At least 5 years in Level 2 support for mission critical 24x7 production support preferably in public sector
- At least 2 years in a team lead or supervisory role coordinating tasks and mentoring junior engineers
- Proven experience in handling P1/P2 incidents managing post-incident reviews (PIRs) and root cause analysis
- Preferably certification in Red Hat Enterprise Linux or Kubernetes
Knowledge / Skills
- Operating Systems. RHEL (90%) and Windows Server (10%)
- Networking Fundamentals
- Middleware & Infrastructure (Web Server Nginx App Servers Kubernetes with containers (Docker Spring Boot)
- Message Queues (IBM MQ Kafka)
- Database (SQL Server PostgreSQL)
- ITIL/ITSM Process Knowledge
- Security Awareness
- DR and HA concepts
- Strong Technical Skills
- Leadership & Coordination
- Communication & Collaboration
- Operational Governance
At Thales were committed to fostering a workplace where respect trust collaboration and passion drive everything we do. Here youll feel empowered to bring your best self thrive in a supportive culture and love the work you do. Join us and be part of a team reimagining technology to create solutions that truly make a difference for a safer greener and more inclusive world.
View more
View less