Databricks Data Engineer

Manassas, VA - USA

Monthly Salary: Not Disclosed

Posted on: Yesterday

Vacancies: 1 Vacancy

Job Summary

Company Details

We started in early 2019 as a small group of technologists with a passion for making insurance better. Today we are working with a team of industry experts who run five different insurance brands and collectively control $1 billion in annual premiums.

We believe in an idea and execution other words a place where the best ideas win and the people who deliver the most value get the most opportunities.

As we grow our team we are looking for inquisitive entrepreneurial people who are excited to reimagine the insurance industry.

Insurance is too complex. Help us make it better.

Responsibilities

This position requires on-site work MondayThursday at either our Manassas VA or Chesterfield MO location.

The Databricks Data Engineer will help design build deploy and maintain scalable and production grade data pipelines in modern cloud environments enabling analytics AI ML and decision advantage at scale. This role will work with cutting-edge tools like Databricks Delta Lake PySpark and AI/BI genie to transform raw data into actionable insights. As a hands-on Databricks Data Engineer with deep expertise in Azure Databricks and MLOps this role will have the opportunity to migrate and translate legacy SSIS ETL logic into scalable cloud-native data pipelines in Databricks. This role will partner with data engineers data scientists and product manager to design features train/evaluate models and deploy them to production using MLflow Databricks and Workflowswith rigorous observability governance (Unity Catalog) and CI/CD automation.

Data Pipeline Engineering

Design build and maintain high-performance scalable ETL/ELT pipelines using Azure Databricks Delta Lake and PySpark.
Convert and modernize existing SSIS package logic into cloud-native Databricks pipelines using PySpark notebooks Delta Live Tables (DLT) and Databricks Workflows.
Implement reliable batch and streaming pipelines with robust data quality and validation frameworks.
Optimize pipeline performance using Photon efficient file formats partitioning Z-ordering and caching strategies.

Lakehouse Platform Development

Develop and manage datasets within Delta Lake ensuring ACID reliability schema evolution versioning and time travel.
Architect feature-rich data layers including:
- Bronze (raw ingestion)
- Silver (validated conformed)
- Gold (analytics-ready and ML-ready)
Implement data governance using Unity Catalog for fine-grained access control lineage auditability and metadata management.

MLOps & ML-Enabled Data Pipelines

Partner with data scientists and data engineers to create feature pipelines model training pipelines and production scoring pipelines.
Deploy and operationalize models using MLflow Databricks Model Registry and Databricks Workflows.
Use Databricks built-in AI SQL functions such as aiquery aiforecast aianalyzesentiment to generate actionable insight from large amount of unstructured or structured raw data
Implement monitoring for:
- Pipeline failures
- Data/feature drift
- Model performance degradation
- Operational SLAs/SLIs/SLOs
Build automated CI/CD workflows using GitHub Actions or Azure DevOps for notebook deployment pipeline testing and environment promotion.

Data Platform Data Security & Data Governance

Collaborate with data engineers to design reliable data products on Delta Lake; leverage Delta Live Tables (DLT) for declarative pipelines when applicable.
Enforce Unity Catalog for lineage permissions and audit; manage secrets tokens and keys securely (e.g. Databricks secrets Key Vault/Secrets Manager).

Collaboration & Leadership

Work closely with cross-functional teams: data engineering data scientist product manager and business stakeholders.
Serve as a Databricks SMEchampioning best practices code standards governance and reusable frameworks.
Document architecture workflows data models runbooks and operational procedures.

Qualifications

Minimum of 3 years of experience in Databricks PySpark notebooks Python DevOps software development and data engineering.
Certified Databricks Data Engineer Associate or Professional is a plus.

Skills & Competencies

Proficient in designing building deploying and maintaining high-performance scalable ETL/ELT pipelines using Azure Databricks Delta Lake and PySpark Notebook.
Proficient in building deploying and operating production ML models such as supervised unsupervised and anomaly detection including techniques for imbalanced datasets
Proficient with ML engineering and MLOps including model versioning CI/CD for ML monitoring drift detection and automated retraining

Proficiency in Python including Pandas and PySpark Dataframes
Expert level of SQL skills including Stored Procedure experience with SSIS SSRS Power BI is a plus.

Proficient with cloud data engineering platforms such as Azure Databricks Spark or SQL and batch and streaming pipelines
Familiar with Databricks AI Built-In Functions such as AIQuery AIGen AIClassify AIForecast AIAnalyzeSentiment able to use them to extract actionable insights from large amount of unstructured or structured raw data
Experience with Python and ML frameworks such as PyTorch or TensorFlow
Experience improving data quality lineage and observability in enterprise data environments and operationalizing rules and model-driven scoring for prioritization routing or case selection
Experience with predictive analytics machine learning and artificial intelligence desired.

Education

A Bachelors degree in Computer Science Management Information Systems Engineering Math Physics or a related quantitative field is required (4-year degree). Masters degree preferred
Experience in the commercial insurance industry is a plus.

Additional Company Details

The Company is an equal employment opportunity employer.

We do not accept any unsolicited resumes from external recruiting firms.

The company offers a competitive compensation plan and robust benefits package for full time regular employees.
Base salary & Benefits include Health dental vision life disability wellness paid time off 401(k) and profit-sharing plans.

The actual salary for this position will be determined by a number of factors including the scope complexity and location of the role; the skills education training credentials and experience of the candidate; and other conditions of employment.

Additional Requirements

Ability to travel locally and nationally up to 5% of the time

Sponsorship Details

Sponsorship not Offered for this Role

Required Experience:

Company DetailsWe started in early 2019 as a small group of technologists with a passion for making insurance better. Today we are working with a team of industry experts who run five different insurance brands and collectively control $1 billion in annual premiums.We believe in an idea and executio...

Company Details

We believe in an idea and execution other words a place where the best ideas win and the people who deliver the most value get the most opportunities.

As we grow our team we are looking for inquisitive entrepreneurial people who are excited to reimagine the insurance industry.

Insurance is too complex. Help us make it better.

Responsibilities

This position requires on-site work MondayThursday at either our Manassas VA or Chesterfield MO location.

Data Pipeline Engineering

Design build and maintain high-performance scalable ETL/ELT pipelines using Azure Databricks Delta Lake and PySpark.
Convert and modernize existing SSIS package logic into cloud-native Databricks pipelines using PySpark notebooks Delta Live Tables (DLT) and Databricks Workflows.
Implement reliable batch and streaming pipelines with robust data quality and validation frameworks.
Optimize pipeline performance using Photon efficient file formats partitioning Z-ordering and caching strategies.

Lakehouse Platform Development

Develop and manage datasets within Delta Lake ensuring ACID reliability schema evolution versioning and time travel.
Architect feature-rich data layers including:
- Bronze (raw ingestion)
- Silver (validated conformed)
- Gold (analytics-ready and ML-ready)
Implement data governance using Unity Catalog for fine-grained access control lineage auditability and metadata management.

MLOps & ML-Enabled Data Pipelines

Partner with data scientists and data engineers to create feature pipelines model training pipelines and production scoring pipelines.
Deploy and operationalize models using MLflow Databricks Model Registry and Databricks Workflows.
Use Databricks built-in AI SQL functions such as aiquery aiforecast aianalyzesentiment to generate actionable insight from large amount of unstructured or structured raw data
Implement monitoring for:
- Pipeline failures
- Data/feature drift
- Model performance degradation
- Operational SLAs/SLIs/SLOs
Build automated CI/CD workflows using GitHub Actions or Azure DevOps for notebook deployment pipeline testing and environment promotion.

Data Platform Data Security & Data Governance

Collaborate with data engineers to design reliable data products on Delta Lake; leverage Delta Live Tables (DLT) for declarative pipelines when applicable.
Enforce Unity Catalog for lineage permissions and audit; manage secrets tokens and keys securely (e.g. Databricks secrets Key Vault/Secrets Manager).

Collaboration & Leadership

Work closely with cross-functional teams: data engineering data scientist product manager and business stakeholders.
Serve as a Databricks SMEchampioning best practices code standards governance and reusable frameworks.
Document architecture workflows data models runbooks and operational procedures.

Qualifications

Minimum of 3 years of experience in Databricks PySpark notebooks Python DevOps software development and data engineering.
Certified Databricks Data Engineer Associate or Professional is a plus.

Skills & Competencies

Proficient in designing building deploying and maintaining high-performance scalable ETL/ELT pipelines using Azure Databricks Delta Lake and PySpark Notebook.
Proficient in building deploying and operating production ML models such as supervised unsupervised and anomaly detection including techniques for imbalanced datasets
Proficient with ML engineering and MLOps including model versioning CI/CD for ML monitoring drift detection and automated retraining

Proficiency in Python including Pandas and PySpark Dataframes
Expert level of SQL skills including Stored Procedure experience with SSIS SSRS Power BI is a plus.

Proficient with cloud data engineering platforms such as Azure Databricks Spark or SQL and batch and streaming pipelines
Familiar with Databricks AI Built-In Functions such as AIQuery AIGen AIClassify AIForecast AIAnalyzeSentiment able to use them to extract actionable insights from large amount of unstructured or structured raw data
Experience with Python and ML frameworks such as PyTorch or TensorFlow
Experience improving data quality lineage and observability in enterprise data environments and operationalizing rules and model-driven scoring for prioritization routing or case selection
Experience with predictive analytics machine learning and artificial intelligence desired.

Education

A Bachelors degree in Computer Science Management Information Systems Engineering Math Physics or a related quantitative field is required (4-year degree). Masters degree preferred
Experience in the commercial insurance industry is a plus.

Additional Company Details

Additional Requirements

Ability to travel locally and nationally up to 5% of the time

Sponsorship Details

Sponsorship not Offered for this Role

Required Experience:

Key Skills

Apply Now

About Company

Berkley

Berkley is a leader in commercial lines insurance, with over 60+ specialized businesses - each with deep expertise in an industry, product, or regional niche.

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Databricks Data Engineer

Manassas, VA - USA

Job Summary

Company Details

Responsibilities

Qualifications

Additional Company Details

Additional Requirements

Sponsorship Details

Company Details

Responsibilities

Qualifications

Additional Company Details

Additional Requirements

Sponsorship Details

Key Skills

About Company

Related Jobs