Key Responsibilities
Data Operations & Pipeline Support
- Assist in ingesting collecting validating and storing structured/unstructured batch data coming through Edge nodes or direct DB connections
- Support ETL/ELT jobs running on Hadoop Hive Impala and Spark
- Monitor daily data loads troubleshoot failures and ensure data availability for analytics use cases
- Maintain HDFS directory structure Hive tables and data partitions
- Perform file-level data quality checks and checksum validations and table level validations for data consistency
Platform & Infrastructure Operations
- Support the operation of on-prem Hadoop clusters (Cloudera)
- Assist in OS-level tasks: log checks service restarts disk usage monitoring user/permission handling
- Assist in regular Big Data cluster health checks
- Support platform upgrades patches configuration changes and security hardening efforts managed by the senior engineer
- Work with network and system teams during installation troubleshooting or hardware issues
Tools & Technologies
- Assist in running and maintaining data flows involving Hive Impala HDFS Spark Kafka (basic) HBase (basic) and Linux environments
- Use tools like NiFi/SFTP for data movement with NiFi flow development & NiFi cluster management
- Support API-based data push/pull if required for integrations
Data Governance & Documentation
- Maintain metadata data dictionary updates and platform documentation
- Ensure compliance with Kerberos/LDAP authentication and Cloudera Navigator governance processes
- Record operational runbooks and incident logs
Collaboration & Support
- Work under the senior engineer to ensure continuous operations of the client environment
- Participate in joint troubleshooting with Client team during data-source onboarding
- Provide L1/L2 support for data ingestion cluster operations and daily job executions
Work Complexity and Role Expectation
- Work on assigned operational tasks within the Big Data platform under guidance
- Support development testing and automation of simple data flows
- Involve in routine batch workloads testbed validations
- Participate as a team member in platform enhancements monitoring improvements and data integration activities
Person Specifications
Education
- Bachelors degree in computer science IT Electronics/Telecom Engineering or a related field
Technical Skills
- Basic knowledge of Hadoop ecosystem: HDFS Hive Spark Yarn (hands-on exposure is an added benefit)
- Familiarity with Linux shell commands; ability to navigate logs and services
- Good understanding of SQL able to write and troubleshoot complex queries
- Exposure to Python/Scala/Java is an added advantage
- Basic understanding of data pipelines ETL processes and batch data workflows
- Exposure to Cloudera platform is a plus
Experience
- 12 years of experience in Data Engineering Database operations or Big Data platform support
- Experience in telecom domain or enterprise data environments is an added advantage
Soft Skills
- Good analytical and troubleshooting mindset
- Ability to collaborate with senior engineers and follow structured operational practices
- Effective communication and willingness to learn complex distributed systems
Key Responsibilities Data Operations & Pipeline Support Assist in ingesting collecting validating and storing structured/unstructured batch data coming through Edge nodes or direct DB connectionsSupport ETL/ELT jobs running on Hadoop Hive Impala and SparkMonitor daily data loads troubleshoot failure...
Key Responsibilities
Data Operations & Pipeline Support
- Assist in ingesting collecting validating and storing structured/unstructured batch data coming through Edge nodes or direct DB connections
- Support ETL/ELT jobs running on Hadoop Hive Impala and Spark
- Monitor daily data loads troubleshoot failures and ensure data availability for analytics use cases
- Maintain HDFS directory structure Hive tables and data partitions
- Perform file-level data quality checks and checksum validations and table level validations for data consistency
Platform & Infrastructure Operations
- Support the operation of on-prem Hadoop clusters (Cloudera)
- Assist in OS-level tasks: log checks service restarts disk usage monitoring user/permission handling
- Assist in regular Big Data cluster health checks
- Support platform upgrades patches configuration changes and security hardening efforts managed by the senior engineer
- Work with network and system teams during installation troubleshooting or hardware issues
Tools & Technologies
- Assist in running and maintaining data flows involving Hive Impala HDFS Spark Kafka (basic) HBase (basic) and Linux environments
- Use tools like NiFi/SFTP for data movement with NiFi flow development & NiFi cluster management
- Support API-based data push/pull if required for integrations
Data Governance & Documentation
- Maintain metadata data dictionary updates and platform documentation
- Ensure compliance with Kerberos/LDAP authentication and Cloudera Navigator governance processes
- Record operational runbooks and incident logs
Collaboration & Support
- Work under the senior engineer to ensure continuous operations of the client environment
- Participate in joint troubleshooting with Client team during data-source onboarding
- Provide L1/L2 support for data ingestion cluster operations and daily job executions
Work Complexity and Role Expectation
- Work on assigned operational tasks within the Big Data platform under guidance
- Support development testing and automation of simple data flows
- Involve in routine batch workloads testbed validations
- Participate as a team member in platform enhancements monitoring improvements and data integration activities
Person Specifications
Education
- Bachelors degree in computer science IT Electronics/Telecom Engineering or a related field
Technical Skills
- Basic knowledge of Hadoop ecosystem: HDFS Hive Spark Yarn (hands-on exposure is an added benefit)
- Familiarity with Linux shell commands; ability to navigate logs and services
- Good understanding of SQL able to write and troubleshoot complex queries
- Exposure to Python/Scala/Java is an added advantage
- Basic understanding of data pipelines ETL processes and batch data workflows
- Exposure to Cloudera platform is a plus
Experience
- 12 years of experience in Data Engineering Database operations or Big Data platform support
- Experience in telecom domain or enterprise data environments is an added advantage
Soft Skills
- Good analytical and troubleshooting mindset
- Ability to collaborate with senior engineers and follow structured operational practices
- Effective communication and willingness to learn complex distributed systems
View more
View less