10 Years Overall Experience in Data Management Data Lake and Data Warehouse.
6 Years Hadoop Hive Sqoop SQL Teradata.
6 Years PySpark(Python and Spark) Unix.
Good to have Industry leading ETL experience.
Banking Domain experience.
Key Responsibilities:
Ability to design build and unit test applications on Spark framework on Python.
Build PySpark based applications for both batch and streaming requirements which will require indepth knowledge on majority of Hadoop and NoSQL databases as well.
Develop and execute data pipeline testing processes and validate business rules and policies.
Optimize performance of the built Spark applications in Hadoop using configurations around Spark Context SparkSQL Data Frame and Pair RDDs.
Optimize performance for data access requirements by choosing the appropriate native Hadoop file formats (Avro Parquet ORC etc) and compression codec respectively.
Ability to design & build realtime applications using Apache Kafka & Spark Streaming.
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.