Data Engineer who can work on data lake system which has been implemented on Databricks.
Required Skills:
Must be proficient in the major languages such as Scala Spark and Kafka (any variant). Write elegant and maintainable code and be comfortable with picking up new technologies.
Proficient in working with distributed systems and have experience with different distributed processing frameworks that can handle data in batch and near real-time pipelines.
Experience in working with AWS cloud services and Databricks (preferred) to build end-to-end data lakehouse solutions that bring different systems together
Capability to understand and implement metadata-based data processing framework and methodologies
Experience in working with CICD process - Jenkins & GIT Actions
Experience in working with monitoring and control stack - Prometheus/Grafana
Key job responsibilities:
Drive the backbone of our organizational data platform by building robust pipelines that turn complex data into actionable insights using AWS and the Databricks platform
Design scalable maintainable data systems with strong observability
Monitor log alert and ensure system robustness
Build efficient pipelines for structured and unstructured data
Interface with Product engineering regulatory and legal teams as necessary to ensure Lakehouse s compliance adherence
Write clean efficient code that handles massive amounts of structured and unstructured data
Design and build ETL/ELT pipelines to move data into the data lake. Ingest data from multiple sources (databases, APIs, files, streaming, etc.) Transform and clean raw data using Databricks (PySpark/SQL/Delta Lake) Organize data for easy access by analysts and data scientists. Ensure data quality, consistency, and reliability. Monitor and optimize pipeline performance and cost. Work with business and analytics teams to make sure data meets their needs. Maintain documentation and follow best practices for data governance and security.
Education
Any Degree