Job Title: Data Engineer Hadoop to Databricks Migration
Location: Columbus OH Jersey City NJ
Duration: Contract
Experience: 8 years (minimum 1 year in Databricks and PySpark)
Job Description: We are seeking a highly skilled Data Engineer with hands-on experience in Hadoop to Databricks migration projects. The ideal candidate will have a strong background in AWS cloud platforms Databricks PySpark and data observability/monitoring using Splunk.
You will play a critical role in the modernization of big data platforms by transforming legacy Hadoop systems into scalable and efficient Databricks-based pipelines.
Key Responsibilities: - Lead or contribute to migration of data pipelines and jobs from Hadoop ecosystem to Databricks on AWS.
- Develop and optimize PySpark jobs for data ingestion transformation and processing.
- Build scalable and efficient data solutions in the Databricks platform.
- Collaborate with data architects analysts and business teams to ensure data models and pipelines meet business requirements.
- Monitor and troubleshoot production workflows using Splunk or other observability tools.
- Ensure data quality performance tuning and governance standards are met during the migration process.
Required Skills: - Strong hands-on experience with Databricks and PySpark.
- Proven experience with Hadoop ecosystem and its components (Hive HDFS etc.).
- Proficiency in AWS services (S3 EMR Glue Lambda etc.).
- Experience in creating and managing ETL/ELT pipelines in a distributed environment.
- Knowledge of Splunk for log monitoring alerting and operational insights.
- Strong understanding of big data architecture performance optimization and cost management.
Preferred Qualifications: - Experience in enterprise-scale data lake or lakehouse implementations.
- Familiarity with CI/CD practices for data pipelines.
- Strong analytical and problem-solving skills.
- Excellent communication and collaboration abilities.