Job Description:
- Administer and maintain the Cloudera Data Platform (CDP) across all environments (dev/test/prod)
- Strong expertise in Big Data ecosystem like Spark Hive Sqoop HDFS Map Reduce Oozie Yarn HBase Nifi.
- Develop and optimize complex Hive queries including the use of analytical functions for reporting and data
transformation.
- Create custom UDFs in Hive to handle specific business logic and integration needs.
- Ensure efficient data ingestion and movement using Sqoop Nifi and Oozie workflows.
- Work with various data formats (CSV TSV Parquet ORC JSON AVRO) and compression techniques (Gzip Snappy)
to maximize performance and storage.
- Monitor and tune performance of YARN and Spark applications for optimal resource utilization.
- In depth Knowledge on Architecture of Distributed Systems and Parallel Computing.
Internal
- Good knowledge in Oracle PL/SQL and shell scripting.
- Strong problem-solving and analytical thinking.
- Effective communication and documentation skills.
- Ability to collaborate across multi-disciplinary teams.
- Self-driven with the ability to manage multiple priorities under tight timelines.