Advanced Data Engineer Databricks

Honeywell

Job Location:

Bengaluru - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Description

Ddvanced Data Engineer - Value Engineering & Component Engineering COE

Location: Bangalore IN (Hybrid)

Role Overview: Honeywells VECE COE is building a next-generation AI-Ready data platform to power advanced analytics predictive insights and data science at enterprise scale. As a Senior Data Engineer you will be a founding technical pillar of this platform: designing and building the data infrastructure that transforms raw multi-source data into governed high-quality analytics-ready assets.

This is not a maintenance role. You will architect build and own end-to-end data pipelines using Azure Databricks as the primary platform following Medallion Architecture principles and delivering trusted data to downstream consumers in Google Cloud Platform (GCP). You will directly shape how Honeywells VECE organization transitions from traditional descriptive analytics to proactive AI-driven decision-making.

Responsibilities

What you will build

Data Pipelines & Ingestion

Implement end-to-end ingestion pipelines from heterogeneous sources (i.e. Snowflake SQL Server Excel REST APIs and unstructured files) into Azure Databricks following defined architecture patterns
Build and maintain Bronze Silver Gold Medallion layers applying transformation logic business rules and quality checks at each stage
Implement incremental loading pattern (i.e. CDC watermarking Delta Lake MERGE/UPSERT) to ensure efficient scalable and reliable data delivery
Develop pipelines for structured and unstructured data (i.e. documents JSON Parquet Excel) supporting AI and ML consumption downstream

Data Modeling & Semantic Layer

Implement and extend data models (i.e. fact/dimension tables domain data marts) following designs defined by the Senior DE and AI team.
Write clean modular reusable PySpark and SQL transformation logic that is testable documented and deployable via CI/CD
Contribute to the semantic layer that powers Power BI dashboards and GCP-connected analytics consumers
Maintain and improve existing models as business requirements evolve

Orchestration and Data Ops

Build and manage Databricks Workflows: configuring task dependencies retry policies and failure alerting
Follow and contribute to CI/CD practices: version control pull requests automated testing and deployment to Dev/QA/Prod environments using Azure DevOps or GitHub Actions
Package and deploy reusable logic as Python libraries following team standards
Monitor pipeline health investigate failures and resolve data issues within SLA

Data Governance & Quality

Apply data quality rules (i.e. validation deduplication null checks reconciliation) within pipelines to ensure data arrives fit for purpose
Operate within the Unity Catalog governance framework respecting RBAC namespace structure and tagging standards defined by platform leads
Ensure data delivered to GCP is schema-consistent validated and documented
Flag and escalate data quality issues proactively not reactively

FinOps Awareness

Write cost-conscious PySpark avoiding unnecessary full scans optimizing joins using appropriate cluster types
Apply Delta table best practices (i.e. VACUUM OPTIMIZE compaction) to manage storage costs
Follow cluster policies defined by platform leads and flag unusual resource consumption

Must Have

Databricks: 2 years hands-on: PySpark Delta Lake Workflows Unity Catalog.
Demonstrate expertise in data strategy for example: Medallion Architecture Domain Data Modeling and Functional Data Architecture.
Data Quality Frameworks (i.e. rule-based validation anomaly detection)
Data Pipelines: incremental loading CDC CI/CD Observability
Advanced Python/Pyspark and Advanced SQL
Strongly preferred: DLT UC GCP Azure Kafka.
Highly value Databricks Certified Professional

Qualifications

Experience

4-6 years of overall data engineering experience
2 years of hands-on Azure Databricks experience in production environments
Demonstrated ability to build and deliver pipelines not just maintain or support them
Experience working within a defined architecture and contributing to its improvement
Comfortable working with multiple data source types relational file-based API

About Honeywell: Honeywell Industrial Automation enhances process industry operations creates sensor technologies automates supply chains and improves worker safety. The VECE COE focuses on optimizing operational processes and driving sustainable growth

Required Experience:

DescriptionDdvanced Data Engineer - Value Engineering & Component Engineering COELocation: Bangalore IN (Hybrid)Role Overview: Honeywells VECE COE is building a next-generation AI-Ready data platform to power advanced analytics predictive insights and data science at enterprise scale. As a Senior Da...