Employer Active
Key Responsibilities:
Data Ingestion and Integration:
Develop and maintain data ingestion processes to collect data from various sources.
Integrate data from different platforms and databases into a unified data lake.
Data Processing:
Create data processing jobs using Hive and PySpark for largescale data transformation.
Optimize data processing workflows to ensure efficiency and performance.
Data Pipeline Development:
Design and implement ETL pipelines to move data from raw to processed formats.
Monitor and troubleshoot data pipelines ensuring data quality and reliability.
Data Modeling and Optimization:
Develop data models for efficient querying and reporting using Hive.
Implement performance tuning and optimization strategies for Hadoop and Spark.
Data Governance:
Implement data security and access controls to protect sensitive information.
Ensure compliance with data governance policies and best practices.
Collaboration:
Collaborate with data scientists analysts and other stakeholders to understand data requirements and provide data support.
Qualifications:
Preferred Qualifications:
Full Time