Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailJob Title: Data Engineer
Location: Dallas TX
Job Summary:
As a Databricks Lead you will be a critical member of our data engineering team responsible for designing developing and optimizing our data pipelines and platforms on Databricks primarily leveraging AWS services. You will play a key role in implementing robust data governance with Unity Catalog and ensuring cost-effective data solutions. This role requires a strong technical leader who can mentor junior engineers drive best practices and contribute hands-on to complex data challenges.
Responsibilities:
* Databricks Platform Leadership:
* Lead the design development and deployment of large-scale data solutions on the Databricks platform.
* Establish and enforce best practices for Databricks usage including notebook development job orchestration and cluster management.
* Stay abreast of the latest Databricks features and capabilities recommending and implementing improvements.
* Data Ingestion and Streaming (Kafka):
* Architect and implement real-time and batch data ingestion pipelines using Apache Kafka for high-volume data streams.
* Integrate Kafka with Databricks for seamless data processing and analysis.
* Optimize Kafka consumers and producers for performance and reliability.
* Data Governance and Management (Unity Catalog):
* Implement and manage data governance policies and access controls using Databricks Unity Catalog.
* Define and enforce data cataloging lineage and security standards within the Databricks Lakehouse.
* Collaborate with data governance teams to ensure compliance and data quality.
* AWS Cloud Integration:
* Leverage various AWS services (S3 EC2 Lambda Glue etc.) to build a robust and scalable data infrastructure.
* Manage and optimize AWS resources for Databricks workloads.
* Ensure secure and compliant integration between Databricks and AWS.
* Cost Optimization:
* Proactively identify and implement strategies for cost optimization across Databricks and AWS resources.
* Monitor DBU consumption cluster utilization and storage costs providing recommendations for efficiency gains.
* Implement autoscaling auto-termination and right-sizing strategies to minimize operational expenses.
* Technical Leadership & Mentoring:
* Provide technical guidance and mentorship to a team of data engineers.
* Conduct code reviews promote coding standards and foster a culture of continuous improvement.
* Lead technical discussions and decision-making for complex data engineering problems.
* Data Pipeline Development & Optimization:
* Develop test and maintain robust and efficient ETL/ELT pipelines using PySpark/Spark SQL.
* Optimize Spark jobs for performance scalability and resource utilization.
* Troubleshoot and resolve complex data pipeline issues.
* Collaboration:
* Work closely with data scientists analysts and other engineering teams to understand data requirements and deliver solutions.
* Communicate technical concepts effectively to both technical and non-technical stakeholders.
Qualifications:
* Bachelors or Masters degree in Computer Science Data Engineering or a related quantitative field.
* 7 years of experience in data engineering with at least 3 years in a lead or senior role.
* Proven expertise in designing and implementing data solutions on Databricks.
* Strong hands-on experience with Apache Kafka for real-time data streaming.
* In-depth knowledge and practical experience with Databricks Unity Catalog for data governance and access control.
* Solid understanding of AWS cloud services and their application in data architectures (S3 EC2 Lambda VPC IAM etc.).
* Demonstrated ability to optimize cloud resource usage and implement cost-saving strategies.
* Proficiency in Python and Spark (PySpark/Spark SQL) for data processing and analysis.
* Experience with Delta Lake and other modern data lake formats.
* Excellent problem-solving analytical and communication skills.
Added Advantage (Bonus Skills):
* Experience with Apache Flink for stream processing.
* Databricks certifications.
* Experience with CI/CD pipelines for Databricks deployments.
* Knowledge of other cloud platforms (Azure GCP) is a plus.
.
Full-time