Sr Advanced AI Data Engineer
Job Summary
As a Senior Advanced Data Engineer here at Honeywell you will play a crucial role in designing developing and maintaining advanced data solutions that drive business insights and support decision-making processes. You will leverage your expertise in data engineering to build scalable data pipelines optimize data storage and ensure data quality and integrity.
Your ability to work with cross-functional teams and translate business requirements into technical solutions will be key to your success in this role.
In this role you will impact the business by enabling data-driven decision-making optimizing data processes and improving overall data management. Your work will contribute to increased operational efficiency cost savings and enhanced customer satisfaction.
At Honeywell our people leaders play a critical role in developing and supporting our employees to help them perform at their best and drive change across the company. Help to build a strong diverse team by recruiting talent identifying and developing successors driving retention and engagement and fostering an inclusive culture.
Responsibilities
AI-Ready Data Platform
- Design and implement end-to-end ingestion pipelines from heterogeneous sources: including Snowflake SQL Server Excel REST APIs and unstructured data: into Azure Databricks
- Architect and enforce Medallion Architecture (Bronze Silver Gold) ensuring data arrives clean validated and fit for purpose at each layer
- Build Delta Live Tables (DLT) pipelines with declarative data quality expectations schema evolution and automated lineage tracking
- Implement incremental loading patterns using CDC (Change Data Capture) watermarking and Delta Lake MERGE/UPSERT for efficient scalable ingestion
- Enable structured and unstructured data processing: documents Excel files JSON Parquet : building the foundation for AI and ML consumption
Data Modeling & Semantic Layer
- Design and implement the Engineering data model: dimensional models fact/dimension tables and domain-specific data marts: serving analytics BI ML and AI use cases
- Build a governed reusable semantic layer on top of the Gold layer enabling self-service analytics through Power BI and GCP-connected consumers
- Ensure data models are documented versioned and aligned to business domains within the VECE COE
Orchestration and Data Ops
- Build and manageDatabricks Workflowswith multi-task dependencies SLA monitoring retry logic and alerting
- ImplementCI/CD pipelinesfor Databricks using Azure DevOps and GitHub Actions : includingPython Wheel packagingfor reusable utility libraries deployed across the platform
- Applysoftware engineering best practices: version control unit testing modular code design and automated deployment to Dev/QA/Prod environments
- Cluster right-sizing DBU management Delta table optimization (VACUUM compaction) cost monitoring across Azure Databricks and GCP
Data Governance & Quality
- Implement and manageUnity Catalogfor centralized data governance: three-level namespace (catalog schema table) fine-grained RBAC data masking and audit logging
- Builddata quality frameworks: rule-based validation deduplication reconciliation and anomaly detection: ensuring data arrives fit for AI/ML consumption
- Establishdata lineage trackingacross ingestion transformation and serving layers
- Govern data delivery toGCP: ensuring secure validated schema-consistent outputs consumed by downstream data science and analytics teams
AI & Proactive Analytics Foundation
- Design pipelines that areAI-ready from day one: supporting structured ML feature pipelines embedding generation and futureVector DBintegrations
- Build the data infrastructure that enables the shift fromdescriptive dashboards to proactive predictive analytics
- Collaborate with Data Scientists and Analytics Engineers to ensure the Gold layer supportsmodel training feature stores and real-time inference pipelines
Qualifications
YOU MUST HAVE
- Databricks: 4 years hands-on: PySpark Delta Lake Workflows Unity Catalog.
- Demonstrate expertise in data strategy for example: Medallion Architecture Domain Data Modeling and Functional Data Architecture.
- Data Quality Frameworks (i.e. rule-based validation anomaly detection)
- Data Pipelines: incremental loading CDC CI/CD Observability
- Advanced Python/Pyspark and Advanced SQL
- Strongly preferred: DLT UC GCP Azure Kafka.
- Highly value Databricks Certified Professional
- 7 years of overall data engineering experience
- 4 years of hands-on Azure Databricks experience in production environments
- Proven experience building platforms not just maintaining them: greenfield builds migrations framework development
- Experience with financial engineering enterprise or industrial-scale datasets preferred
- Demonstrated ability to own technical decisions end-to-end: from architecture to production deployment
#LI-Hybrid
Required Experience:
Senior IC
About Company
Honeywell helps organizations solve the world's most complex challenges in automation, the future of aviation and energy transition. As a trusted partner, we provide actionable solutions and innovation through our Aerospace Technologies, Building Automation, Energy and Sustainability ... View more