Hadoop ETL Developer and PySpark

Dallas, IA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Job Description: We are seeking a highly experienced Senior Big Data & DevOps Engineer with 8 years of professional experience in HDFS Hive Impala PySpark Python and DevOps automation tools such as uDeploy and Jenkins. This role is responsible for managing end-to-end data operations including HDFS table management ETL pipeline development multi-environment codebase governance platform upgrades and production support.

The ideal candidate will have strong expertise in Linux system operations Big Data ecosystem tools and experience with incident/change management using ServiceNow. This role plays a key part in ensuring the stability scalability and efficiency of enterprise data platforms while enabling seamless development-to-production workflows.
Key Responsibilities:
Big Data Platform Operations
Design manage and optimize HDFS directories tables and partitioning strategies.
Implement and enforce data retention and lifecycle policies across large datasets.
Administer Hive and Impala environments ensuring high availability performance tuning and security compliance.
ETL Development & Data Engineering
Develop scalable ETL pipelines using PySpark Hive and Python.
Build reusable frameworks for data ingestion transformation and aggregation.
Optimize job performance through query tuning resource management and parallelization.
DevOps & Environment Management
Maintain and promote code across DEV QA UAT and PROD environments.
Develop and support CI/CD pipelines using Jenkins and uDeploy for automated deployments.
Perform environment upgrades patching and dependency management aligned with release schedules.
Linux & Infrastructure Operations
Execute Linux administration tasks including performance tuning disk management and scripting (Bash/Python).
Troubleshoot cluster-level issues including node failures job errors and distributed system anomalies.
Change & Incident Management
Drive incident resolution and change execution using ServiceNow workflows.
Conduct root cause analysis (RCA) for critical issues and implement preventive solutions.
Ensure compliance with ITIL processes for change incident and problem management.
Collaboration & Technical Leadership
Partner with data engineers developers DevOps teams and business analysts to ensure operational excellence.
Mentor junior engineers and contribute to technical leadership across the Big Data ecosystem.
Document operational procedures troubleshooting guides and architectural decisions for internal knowledge sharing.
Required Qualifications:
Bachelors degree in Computer Science Information Technology or related field.
8 years of experience in Big Data engineering and DevOps practices.
Advanced proficiency in HDFS Hive Impala PySpark Python and Linux.
Proven experience with CI/CD tools such as Jenkins and uDeploy.
Strong understanding of ETL development orchestration and performance optimization.
Experience with ServiceNow for incident/change/problem management.
Excellent analytical troubleshooting and communication skills.
Nice to Have:
Exposure to cloud-based Big Data platforms (AWS EMR).
Familiarity with containerization (Docker Kubernetes) and infrastructure automation tools (Ansible Terraform).

Job Description: We are seeking a highly experienced Senior Big Data & DevOps Engineer with 8 years of professional experience in HDFS Hive Impala PySpark Python and DevOps automation tools such as uDeploy and Jenkins. This role is responsible for managing end-to-end data operations including HDFS ...