Job Description: Databricks Architect (12 years)
Role Overview
The Databricks Architect is responsible for designing implementing and optimizing
scalable data engineering and analytics solutions on the Databricks Lakehouse
Platform on AWS. This role requires deep expertise in distributed data processing
Delta Lake based architectures and modern data engineering best practices. The
architect will partner with crossfunctional teams to define data strategies ensure
platform reliability and enable advanced analytics ML and BI workloads across the
enterprise.
Must Demonstrate (Critical Architectural Capabilities)
Designing Databricks-based Lakehouse architectures on AWS
Clear separation of compute layer vs. serving layer
Low-latency API/data delivery strategy (cannot rely solely on Spark)
Caching strategies for performance acceleration and cost efficiency
Data partitioning and optimization strategy including file-size tuning
Ability to handle multi-terabyte structured time-series datasets
Skill in distilling architectural significance from complex requirements
Strong curiosity and requirementprobing mindset
Playercoach leadership style (hands-on engineering design guidance)
Key Responsibilities
Architecture & Design
Architect end-to-end Databricks Lakehouse solutions on AWS for ingestion
processing storage and consumption.
Define and implement Delta Lake patterns including Medallion Architecture
(Bronze/Silver/Gold).
Lead design of scalable data pipelines using PySpark Spark SQL Workflows
and Delta Live Pipelines.
Architect solutions for structured semi-structured and time-series workloads.
Ensure architectures support low-latency delivery servinglayer separation
and high performance.
Engineering & Implementation
Build robust ETL/ELT pipelines using Databricks Notebooks Jobs and
Workflows.
Implement streaming and incremental data processing as needed using
Structured Streaming.
Optimize Spark jobs with partitioning caching ZORDER file compaction and
shuffle reduction.
Implement CI/CD automation using Databricks Repos GitLab/GitHub and
Infrastructure-as-Code.
AWS Cloud & Platform Expertise
Architect Databricks solutions using AWS-native services including:
o S3 (data storage)
o Glue Catalog (metadata governance)
o IAM (identity & access control)
o Lambda / API Gateway (low-latency serving mechanisms)
o Kinesis (streaming ingestion)
Ensure security governance and compliance via Unity Catalog RBAC and
encryption standards.
Monitor workloads and optimize cluster sizing autoscaling and cost controls.
Collaboration & Leadership
Partner with data engineers ML engineers BI teams and business
stakeholders.
Serve as a Databricks SME defining best practices standards and
architectural patterns.
Conduct architectural reviews and guide teams on solution choices.
Lead PoCs evaluate new Databricks features and drive platform adoption
across teams.
Quality Governance & Observability
Define standards for data quality lineage observability and operational
governance.
Implement automated testing frameworks for pipelines and notebooks.
Establish monitoring dashboards performance baselines and reliability KPIs.
Required Skills & Experience
Technical Skills
7 years in data engineering or data architecture.
3 years hands-on with Databricks.
Strong expertise in Spark PySpark SQL and distributed data systems.
Deep understanding of Delta Lake features (ACID OPTIMIZE ZORDER
Auto Loader).
Experience with workflows/orchestration and Databricks REST APIs.
Hands-on expertise with AWS specifically:
o S3
o Glue / Glue Catalog
o IAM
o Lambda
o Kinesis
Familiarity with CI/CD Git DevOps and IaC (Terraform preferred).
Soft Skills
Strong analytical and problemsolving abilities.
Excellent communication and stakeholder management.
Ability to lead design discussions and guide engineering teams.
Strong documentation and architectural blueprinting abilities.
Preferred Qualifications
Databricks Certifications such as:
o Databricks Certified Data Engineer Professional
o Databricks Certified Machine Learning Professional
o Databricks Lakehouse Fundamentals
Experience with MLflow Feature Store or MLOps pipelines.
Experience in regulated industries (BFSI healthcare etc.).
Job Description: Databricks Architect (12 years) Role Overview The Databricks Architect is responsible for designing implementing and optimizing scalable data engineering and analytics solutions on the Databricks Lakehouse Platform on AWS. This role requires deep expertise in distributed data proces...
Job Description: Databricks Architect (12 years)
Role Overview
The Databricks Architect is responsible for designing implementing and optimizing
scalable data engineering and analytics solutions on the Databricks Lakehouse
Platform on AWS. This role requires deep expertise in distributed data processing
Delta Lake based architectures and modern data engineering best practices. The
architect will partner with crossfunctional teams to define data strategies ensure
platform reliability and enable advanced analytics ML and BI workloads across the
enterprise.
Must Demonstrate (Critical Architectural Capabilities)
Designing Databricks-based Lakehouse architectures on AWS
Clear separation of compute layer vs. serving layer
Low-latency API/data delivery strategy (cannot rely solely on Spark)
Caching strategies for performance acceleration and cost efficiency
Data partitioning and optimization strategy including file-size tuning
Ability to handle multi-terabyte structured time-series datasets
Skill in distilling architectural significance from complex requirements
Strong curiosity and requirementprobing mindset
Playercoach leadership style (hands-on engineering design guidance)
Key Responsibilities
Architecture & Design
Architect end-to-end Databricks Lakehouse solutions on AWS for ingestion
processing storage and consumption.
Define and implement Delta Lake patterns including Medallion Architecture
(Bronze/Silver/Gold).
Lead design of scalable data pipelines using PySpark Spark SQL Workflows
and Delta Live Pipelines.
Architect solutions for structured semi-structured and time-series workloads.
Ensure architectures support low-latency delivery servinglayer separation
and high performance.
Engineering & Implementation
Build robust ETL/ELT pipelines using Databricks Notebooks Jobs and
Workflows.
Implement streaming and incremental data processing as needed using
Structured Streaming.
Optimize Spark jobs with partitioning caching ZORDER file compaction and
shuffle reduction.
Implement CI/CD automation using Databricks Repos GitLab/GitHub and
Infrastructure-as-Code.
AWS Cloud & Platform Expertise
Architect Databricks solutions using AWS-native services including:
o S3 (data storage)
o Glue Catalog (metadata governance)
o IAM (identity & access control)
o Lambda / API Gateway (low-latency serving mechanisms)
o Kinesis (streaming ingestion)
Ensure security governance and compliance via Unity Catalog RBAC and
encryption standards.
Monitor workloads and optimize cluster sizing autoscaling and cost controls.
Collaboration & Leadership
Partner with data engineers ML engineers BI teams and business
stakeholders.
Serve as a Databricks SME defining best practices standards and
architectural patterns.
Conduct architectural reviews and guide teams on solution choices.
Lead PoCs evaluate new Databricks features and drive platform adoption
across teams.
Quality Governance & Observability
Define standards for data quality lineage observability and operational
governance.
Implement automated testing frameworks for pipelines and notebooks.
Establish monitoring dashboards performance baselines and reliability KPIs.
Required Skills & Experience
Technical Skills
7 years in data engineering or data architecture.
3 years hands-on with Databricks.
Strong expertise in Spark PySpark SQL and distributed data systems.
Deep understanding of Delta Lake features (ACID OPTIMIZE ZORDER
Auto Loader).
Experience with workflows/orchestration and Databricks REST APIs.
Hands-on expertise with AWS specifically:
o S3
o Glue / Glue Catalog
o IAM
o Lambda
o Kinesis
Familiarity with CI/CD Git DevOps and IaC (Terraform preferred).
Soft Skills
Strong analytical and problemsolving abilities.
Excellent communication and stakeholder management.
Ability to lead design discussions and guide engineering teams.
Strong documentation and architectural blueprinting abilities.
Preferred Qualifications
Databricks Certifications such as:
o Databricks Certified Data Engineer Professional
o Databricks Certified Machine Learning Professional
o Databricks Lakehouse Fundamentals
Experience with MLflow Feature Store or MLOps pipelines.
Experience in regulated industries (BFSI healthcare etc.).
View more
View less