AWS Databricks Data Engineer

Los Angeles, CA - USA

Monthly Salary: Not Disclosed

Posted on: 13 hours ago

Vacancies: 1 Vacancy

Job Summary

Job Title: AWS Databricks Data Engineer

Job Location: Los Angeles CA (Hybrid)

Hire type: FTE / CTH

Note: Only Locals to California

Job Description

We are seeking a highly skilled AWS Data Engineer with strong expertise in SQL Python PySpark Data Warehousing and Cloud-based ETL to join our data engineering team. The ideal candidate will design implement and optimize large-scale data pipelines ensuring scalability reliability and high performance. This role requires close collaboration with cross-functional teams and business stakeholders to deliver modern efficient data solutions.

Key Responsibilities

1. Data Pipeline Development

Build and maintain scalable ETL/ELT pipelines using Databricks on AWS.
Leverage PySpark/Spark and SQL to transform and process large complex datasets.
Integrate data from multiple sources including S3 relational/non-relational databases and AWS-native services.

2. Collaboration & Analysis

Partner with downstream teams to prepare data for dashboards analytics and BI tools.
Work closely with business stakeholders to understand requirements and deliver tailored high quality data solutions.

3. Performance & Optimization

Optimize Databricks workloads for cost performance and efficient compute utilization.
Monitor and troubleshoot pipelines to ensure reliability accuracy and SLA adherence.
Apply query optimization Spark tuning and shuffle minimization best practices when handling tens of millions of rows.

4. Governance & Security

Implement and manage data governance access control and security policies using Unity Catalog.
Ensure compliance with organizational and regulatory data handling standards.

5. Deployment & DevOps

Use Databricks Asset Bundles for deployment of jobs notebooks and configuration across environments.
Maintain effective version control of Databricks artifacts using GitLab or similar tools.
Use CI/CD pipelines to support automated deployments and environment setups.

Technical Skills (Required)

Strong expertise in Databricks (Delta Lake Unity Catalog Lakehouse Architecture Table Triggers Workflows Delta Live Pipelines Databricks Runtime etc.).
Proven ability to implement robust PySpark solutions.
Hands on experience with Databricks Workflows & orchestration.
Solid knowledge of Medallion Architecture (Bronze/Silver/Gold).
Significant experience designing or rebuilding batch heavy data pipelines.
Strong background in query optimization performance tuning and Spark shuffle optimization.
Ability to handle and process tens of millions of records efficiently.
Familiarity with Genie enablement concepts (understanding required; deep experience optional).
Experience with CI/CD environment setup and Git-based development workflows.
Solid understanding of AWS cloud including:
IAM
Networking fundamentals
Storage integration (S3 Glue Catalog etc.)

Preferred Experience

Experience with Databricks Runtime configurations and advanced features.
Knowledge of streaming frameworks such as Spark Structured Streaming.
Experience developing real-time or near real-time data solutions.
Exposure to GitLab pipelines or similar CI/CD systems.

Certifications (Optional)

Databricks Certified Data Engineer Associate / Professional
AWS Data Engineer or AWS Solutions Architect certification

Thanks & Regards

Akhil

Job Title: AWS Databricks Data Engineer Job Location: Los Angeles CA (Hybrid) Hire type: FTE / CTH Note: Only Locals to California Job Description We are seeking a highly skilled AWS Data Engineer with strong expertise in SQL Python PySpark Data Warehousing and Cloud-based ETL to join our ...