Job Title: AWS Databricks Data Engineer
Job Location: Los Angeles CA (Hybrid)
Hire type: FTE / CTH
Note: Only Locals to California
Job Description
We are seeking a highly skilled AWS Data Engineer with strong expertise in SQL Python PySpark Data Warehousing and Cloud-based ETL to join our data engineering team. The ideal candidate will design implement and optimize large-scale data pipelines ensuring scalability reliability and high performance. This role requires close collaboration with cross-functional teams and business stakeholders to deliver modern efficient data solutions.
Key Responsibilities
1. Data Pipeline Development
- Build and maintain scalable ETL/ELT pipelines using Databricks on AWS.
- Leverage PySpark/Spark and SQL to transform and process large complex datasets.
- Integrate data from multiple sources including S3 relational/non-relational databases and AWS-native services.
2. Collaboration & Analysis
- Partner with downstream teams to prepare data for dashboards analytics and BI tools.
- Work closely with business stakeholders to understand requirements and deliver tailored high quality data solutions.
3. Performance & Optimization
- Optimize Databricks workloads for cost performance and efficient compute utilization.
- Monitor and troubleshoot pipelines to ensure reliability accuracy and SLA adherence.
- Apply query optimization Spark tuning and shuffle minimization best practices when handling tens of millions of rows.
4. Governance & Security
- Implement and manage data governance access control and security policies using Unity Catalog.
- Ensure compliance with organizational and regulatory data handling standards.
5. Deployment & DevOps
- Use Databricks Asset Bundles for deployment of jobs notebooks and configuration across environments.
- Maintain effective version control of Databricks artifacts using GitLab or similar tools.
- Use CI/CD pipelines to support automated deployments and environment setups.
Technical Skills (Required)
- Strong expertise in Databricks (Delta Lake Unity Catalog Lakehouse Architecture Table Triggers Workflows Delta Live Pipelines Databricks Runtime etc.).
- Proven ability to implement robust PySpark solutions.
- Hands on experience with Databricks Workflows & orchestration.
- Solid knowledge of Medallion Architecture (Bronze/Silver/Gold).
- Significant experience designing or rebuilding batch heavy data pipelines.
- Strong background in query optimization performance tuning and Spark shuffle optimization.
- Ability to handle and process tens of millions of records efficiently.
- Familiarity with Genie enablement concepts (understanding required; deep experience optional).
- Experience with CI/CD environment setup and Git-based development workflows.
- Solid understanding of AWS cloud including:
- IAM
- Networking fundamentals
- Storage integration (S3 Glue Catalog etc.)
Preferred Experience
- Experience with Databricks Runtime configurations and advanced features.
- Knowledge of streaming frameworks such as Spark Structured Streaming.
- Experience developing real-time or near real-time data solutions.
- Exposure to GitLab pipelines or similar CI/CD systems.
Certifications (Optional)
- Databricks Certified Data Engineer Associate / Professional
- AWS Data Engineer or AWS Solutions Architect certification
Thanks & Regards
Akhil
Job Title: AWS Databricks Data Engineer Job Location: Los Angeles CA (Hybrid) Hire type: FTE / CTH Note: Only Locals to California Job Description We are seeking a highly skilled AWS Data Engineer with strong expertise in SQL Python PySpark Data Warehousing and Cloud-based ETL to join our ...
Job Title: AWS Databricks Data Engineer
Job Location: Los Angeles CA (Hybrid)
Hire type: FTE / CTH
Note: Only Locals to California
Job Description
We are seeking a highly skilled AWS Data Engineer with strong expertise in SQL Python PySpark Data Warehousing and Cloud-based ETL to join our data engineering team. The ideal candidate will design implement and optimize large-scale data pipelines ensuring scalability reliability and high performance. This role requires close collaboration with cross-functional teams and business stakeholders to deliver modern efficient data solutions.
Key Responsibilities
1. Data Pipeline Development
- Build and maintain scalable ETL/ELT pipelines using Databricks on AWS.
- Leverage PySpark/Spark and SQL to transform and process large complex datasets.
- Integrate data from multiple sources including S3 relational/non-relational databases and AWS-native services.
2. Collaboration & Analysis
- Partner with downstream teams to prepare data for dashboards analytics and BI tools.
- Work closely with business stakeholders to understand requirements and deliver tailored high quality data solutions.
3. Performance & Optimization
- Optimize Databricks workloads for cost performance and efficient compute utilization.
- Monitor and troubleshoot pipelines to ensure reliability accuracy and SLA adherence.
- Apply query optimization Spark tuning and shuffle minimization best practices when handling tens of millions of rows.
4. Governance & Security
- Implement and manage data governance access control and security policies using Unity Catalog.
- Ensure compliance with organizational and regulatory data handling standards.
5. Deployment & DevOps
- Use Databricks Asset Bundles for deployment of jobs notebooks and configuration across environments.
- Maintain effective version control of Databricks artifacts using GitLab or similar tools.
- Use CI/CD pipelines to support automated deployments and environment setups.
Technical Skills (Required)
- Strong expertise in Databricks (Delta Lake Unity Catalog Lakehouse Architecture Table Triggers Workflows Delta Live Pipelines Databricks Runtime etc.).
- Proven ability to implement robust PySpark solutions.
- Hands on experience with Databricks Workflows & orchestration.
- Solid knowledge of Medallion Architecture (Bronze/Silver/Gold).
- Significant experience designing or rebuilding batch heavy data pipelines.
- Strong background in query optimization performance tuning and Spark shuffle optimization.
- Ability to handle and process tens of millions of records efficiently.
- Familiarity with Genie enablement concepts (understanding required; deep experience optional).
- Experience with CI/CD environment setup and Git-based development workflows.
- Solid understanding of AWS cloud including:
- IAM
- Networking fundamentals
- Storage integration (S3 Glue Catalog etc.)
Preferred Experience
- Experience with Databricks Runtime configurations and advanced features.
- Knowledge of streaming frameworks such as Spark Structured Streaming.
- Experience developing real-time or near real-time data solutions.
- Exposure to GitLab pipelines or similar CI/CD systems.
Certifications (Optional)
- Databricks Certified Data Engineer Associate / Professional
- AWS Data Engineer or AWS Solutions Architect certification
Thanks & Regards
Akhil
View more
View less