Data Engineer

Colombo - Sri Lanka

Monthly Salary: Not Disclosed

Posted on: 5 hours ago

Vacancies: 1 Vacancy

Job Summary

Key Responsibilities

Data Operations & Pipeline Support

Assist in ingesting collecting validating and storing structured/unstructured batch data coming through Edge nodes or direct DB connections
Support ETL/ELT jobs running on Hadoop Hive Impala and Spark
Monitor daily data loads troubleshoot failures and ensure data availability for analytics use cases
Maintain HDFS directory structure Hive tables and data partitions
Perform file-level data quality checks and checksum validations and table level validations for data consistency

Platform & Infrastructure Operations

Support the operation of on-prem Hadoop clusters (Cloudera)
Assist in OS-level tasks: log checks service restarts disk usage monitoring user/permission handling
Assist in regular Big Data cluster health checks
Support platform upgrades patches configuration changes and security hardening efforts managed by the senior engineer
Work with network and system teams during installation troubleshooting or hardware issues

Tools & Technologies

Assist in running and maintaining data flows involving Hive Impala HDFS Spark Kafka (basic) HBase (basic) and Linux environments
Use tools like NiFi/SFTP for data movement with NiFi flow development & NiFi cluster management
Support API-based data push/pull if required for integrations

Data Governance & Documentation

Maintain metadata data dictionary updates and platform documentation
Ensure compliance with Kerberos/LDAP authentication and Cloudera Navigator governance processes
Record operational runbooks and incident logs

Collaboration & Support

Work under the senior engineer to ensure continuous operations of the client environment
Participate in joint troubleshooting with Client team during data-source onboarding
Provide L1/L2 support for data ingestion cluster operations and daily job executions

Work Complexity and Role Expectation

Work on assigned operational tasks within the Big Data platform under guidance
Support development testing and automation of simple data flows
Involve in routine batch workloads testbed validations
Participate as a team member in platform enhancements monitoring improvements and data integration activities

Person Specifications

Education

Bachelors degree in computer science IT Electronics/Telecom Engineering or a related field

Technical Skills

Basic knowledge of Hadoop ecosystem: HDFS Hive Spark Yarn (hands-on exposure is an added benefit)
Familiarity with Linux shell commands; ability to navigate logs and services
Good understanding of SQL able to write and troubleshoot complex queries
Exposure to Python/Scala/Java is an added advantage
Basic understanding of data pipelines ETL processes and batch data workflows
Exposure to Cloudera platform is a plus

Experience

12 years of experience in Data Engineering Database operations or Big Data platform support
Experience in telecom domain or enterprise data environments is an added advantage

Soft Skills

Good analytical and troubleshooting mindset
Ability to collaborate with senior engineers and follow structured operational practices
Effective communication and willingness to learn complex distributed systems

Key Responsibilities Data Operations & Pipeline Support Assist in ingesting collecting validating and storing structured/unstructured batch data coming through Edge nodes or direct DB connectionsSupport ETL/ELT jobs running on Hadoop Hive Impala and SparkMonitor daily data loads troubleshoot failure...