Role: Data Engineer with Hadoop
Location: Columbus OH & Jersey City NJ
Mode: Contract role
Job Description:
We are seeking a highly skilled Data Engineer with hands-on experience in Hadoop to Databricks migration projects. The ideal candidate will have a strong background in AWS cloud platforms Databricks PySpark and data observability/monitoring using Splunk.
You will play a critical role in the modernization of big data platforms by transforming legacy Hadoop systems into scalable and efficient Databricks-based pipelines.
Key Responsibilities:
- Lead or contribute to migration of data pipelines and jobs from Hadoop ecosystem to Databricks on AWS.
- Develop and optimize PySpark jobs for data ingestion transformation and processing.
- Build scalable and efficient data solutions in the Databricks platform.
- Collaborate with data architects analysts and business teams to ensure data models and pipelines meet business requirements.
- Monitor and troubleshoot production workflows using Splunk or other observability tools.
- Ensure data quality performance tuning and governance standards are met during the migration process.
Required Skills:
- Strong hands-on experience with Databricks and PySpark.
- Proven experience with Hadoop ecosystem and its components (Hive HDFS etc.).
- Proficiency in AWS services (S3 EMR Glue Lambda etc.).
- Experience in creating and managing ETL/ELT pipelines in a distributed environment.
- Knowledge of Splunk for log monitoring alerting and operational insights.
- Strong understanding of big data architecture performance optimization and cost management.
Preferred Qualifications:
- Experience in enterprise-scale data lake or lakehouse implementations.
- Familiarity with CI/CD practices for data pipelines.
- Strong analytical and problem-solving skills.
- Excellent communication and collaboration abilities.