Key responsibilities:
- Working with clients to understand their data.
- Basedontheunderstanding you will be building the data structures and pipelines.
- You will be working on the application from end to end collaborating with UI and other development teams.
- You will be responsible for building the data pipelines to migrate and load the data into the HDFSeither on-prem or in the cloud.
- Developing Data ingestion/process/integration pipelines effectively.
- Creating Hive data structuresmetadata and loading the data into data lakes / Big Data warehouse environments.
- Optimized(Performance tuning) many data pipelines effectively to minimize cost.
- Codeversioning control and git repository is up to date.
- You will be responsible for building and maintaining CI/CD of the data pipelines.
- You will be managing the unit testing of all data pipelines
Requirements
Skills & Experience:
- Bachelor s degree in computer science or related field.
- Minimum of 5years working experience with Spark Hadoop eco systems.
- Minimum of 4years working experience on designing data streaming pipelines.
- Should be an expert in either Python/Scala/Java.
- Should have experience in Data Ingestion and Integration into data lake using hadoop ecosystem tools such as Sqoop Spark SQL Hive Airflow etc..
- Should have experience optimizing (Performance tuning) data pipelines.
- Minimum experience of 3 years on NoSQL and Spark Streaming.
- Knowledge of Kubernetes and Docker is a plus.
- Should have experience with Cloud services either Azure/AWS.
- Should have experience with on-prem distribution such as Cloudera/HortonWorks/MapR.
- Basic understanding of CI/CD pipelines.
- Basic knowledge of Linux environment and commands.
Azure, Spark and Hadoop, Data Pipeline, Oozie or Airflow or Azure Data Factory, Python or Scala or Java, NoSQL & Spark, Kubernetes or Docker, Cloudera or HortonWorks, MapR, CI/CD pipelines.
Education
Bachelor s degree