Job Summary:
We are seeking a Senior Java Spark Developer with expertise in Java Apache Spark and the Cloudera Hadoop Ecosystem to design and develop large-scale data processing applications. The ideal candidate will have strong hands-on experience in Java-based Spark development distributed computing and performance optimization for handling big data workloads.
Key Responsibilities:
Java & Spark Development:
- Develop test and deploy Java-based Apache Spark applications for large-scale data processing.
- Optimize and fine-tune Spark jobs for performance scalability and reliability.
- Implement Java-based microservices and APIs for data integration.
Big Data & Cloudera Ecosystem:
- Work with Cloudera Hadoop components such as HDFS Hive Impala HBase Kafka and Sqoop.
- Design and implement high-performance data storage and retrieval solutions.
- Troubleshoot and resolve performance bottlenecks in Spark and Cloudera platforms.
Collaboration & Data Engineering:
- Collaborate with data scientists business analysts and developers to understand data requirements.
- Implement data integrity accuracy and security best practices across all data processing tasks.
- Work with Kafka Flume Oozie and Nifi for real-time and batch data ingestion.
Software Development & Deployment:
- Implement version control (Git) and CI/CD pipelines (Jenkins GitLab) for Spark applications.
- Deploy and maintain Spark applications in cloud or on-premises Cloudera environments.
Required Skills & Experience:
- 8 years of experience in application development with a strong background in Java and Big Data processing.
- Strong hands-on experience in Java Apache Spark and Spark SQL for distributed data processing.
- Proficiency in Cloudera Hadoop (CDH) components such as HDFS Hive Impala HBase Kafka and Sqoop.
- Experience building and optimizing ETL pipelines for large-scale data workloads.
- Hands-on experience with SQL & NoSQL databases like HBase Hive and PostgreSQL.
- Strong knowledge of data warehousing concepts dimensional modeling and data lakes.
- Proven ability to troubleshoot and optimize Spark applications for high performance.
- Familiarity with version control tools (Git Bitbucket) and CI/CD pipelines (Jenkins GitLab).
- Exposure to real-time data streaming technologies like Kafka Flume Oozie and Nifi.
- Strong problem-solving skills attention to detail and ability to work in a fast-paced environment.