Industry Group: Automotive.
Job Title : Data Platform Engineer Cloud Ops Data Ops - R1012521
Location : Plano TX (Local to Dallas area in client office 3days/wk.)
Duration : 12 Months Contract (Potential for extension)
Pay Rate : $70 - $75
Custom Skill Requirements:
- Data Platform Engineer: Cloud Ops Data Ops
- PySpark
- AWS
- Cloud
- DevOps: CI/CD
- Databricks administration
Qualifying Questions:
- Have you worked on Kubernetes
- Do you have PySpark
- Do you have Cloud AWS experience
- Are you able to work with offshore
Job Description:
As a Data Platform Engineer you will be responsible for the design development and maintenance of our high-scale cloud-based data platform treating data as a strategic product. You will lead the implementation of robust optimized data pipelines using PySpark and the Databricks Unified Analytics Platform-leveraging its full ecosystem for Data Engineering Data Science and ML workflows. You will also establish best-in-class DevOps practices using CI/CD and GitHub Actions to ensure automated deployment and reliability. This role demands expertise in large-scale data processing and a commitment to modern scalable data engineering and AWS cloud infrastructure practices.
Key Responsibilities:
- Platform Development: Design build and maintain scalable efficient and reliable ETL/ELT data pipelines to support data ingestion transformation and integration across diverse sources.
- Big Data Implementation: Serve as the subject matter expert for the Databricks environment developing high-performance data transformation logic primarily using PySpark and Python. This includes utilizing Delta Live Tables (DLT) for declarative pipeline construction and ensuring governance through Unity Catalog.
- Cloud Infrastructure Management: Configure maintain and secure the underlying AWS cloud infrastructure required to run the Databricks platform including virtual private clouds (VPCs) network endpoints storage (S3) and cross-account access mechanisms.
- DevOps & Automation (CI/CD): Own and enforce Continuous Integration/Continuous Deployment (CI/CD) practices for the data platform. Specifically design and implement automated deployment workflows using GitHub Actions and modern infrastructure-as-code concepts to deploy Databricks assets (Notebooks Jobs DLT Pipelines and Repos).
- Data Quality & Testing: Design and implement automated unit integration and performance testing frameworks to ensure data quality reliability and compliance with architectural standards.
- Performance Optimization: Optimize data workflows and cluster configurations for performance cost efficiency and scalability across massive datasets.
- Technical Leadership: Provide technical guidance on data principles patterns and best practices (e.g. Medallion Architecture ACID compliance) to promote team capabilities and maturity. This includes leveraging Databricks SQL for high-performance analytics.
- Documentation & Review: Draft and review architectural diagrams design documents and interface specifications to ensure clear communication of data solutions and technical requirements.
Required Qualifications:
- Experience: 5 years of professional experience in Data Engineering focusing on building scalable data platforms and production pipelines.
- Big Data Expertise: Minimum 3 years of hands-on experience developing deploying and optimizing solutions within the Databricks ecosystem.
- Deep expertise required in:
- Delta Lake (ACID transactions time travel optimization).
- Unity Catalog (data governance access control metadata management).
- Delta Live Tables (DLT) (declarative pipeline development).
- Databricks Workspaces Repos and Jobs.
- Databricks SQL for analytics and warehouse operations.
- AWS Infrastructure & Security: Proven hands-on experience (3 years) with core AWS services and infrastructure components including:
- Networking: Configuring and securing VPCs VPC Endpoints Subnets and Route Tables for private connectivity.
- Security & Access: Defining and managing IAM Roles and Policies for secure cross-account access and least privilege access to data.
- Storage: Deep knowledge of Amazon S3 for data lake implementation and governance.
- Programming: Expert proficiency (4 years) in Python for data manipulation scripting and pipeline development.
- Spark & SQL: Deep understanding of distributed computing and extensive experience (3 years) with PySpark and advanced SQL for complex data transformation and querying.
- DevOps & CI/CD: Proven experience (2 years) designing and implementing CI/CD pipelines including proficiency with GitHub Actions or similar tools (e.g. GitLab CI Jenkins) for automated testing and deployment.
- Data Concepts: Full understanding of ETL/ELT Data Warehousing and Data Lake concepts.
- Methodology: Strong grasp of Agile principles (Scrum).
- Version Control: Proficiency with Git for version control.
Preferred Qualifications:
- AWS Data Ecosystem Experience: Familiarity and experience with AWS cloud-native data services such as AWS Glue Amazon Athena Amazon Redshift Amazon RDS and Amazon DynamoDB.
- Knowledge of real-time or near-real-time streaming technologies (e.g. Kafka Spark Structured Streaming).
- Experience in developing feature engineering pipelines for machine learning (ML) consumption.
- Background in performance tuning and capacity planning for large Spark clusters.
Industry Group: Automotive. Job Title : Data Platform Engineer Cloud Ops Data Ops - R1012521 Location : Plano TX (Local to Dallas area in client office 3days/wk.) Duration : 12 Months Contract (Potential for extension) Pay Rate : $70 - $75 Custom Skill Requirements: Data Platform Engineer: C...
Industry Group: Automotive.
Job Title : Data Platform Engineer Cloud Ops Data Ops - R1012521
Location : Plano TX (Local to Dallas area in client office 3days/wk.)
Duration : 12 Months Contract (Potential for extension)
Pay Rate : $70 - $75
Custom Skill Requirements:
- Data Platform Engineer: Cloud Ops Data Ops
- PySpark
- AWS
- Cloud
- DevOps: CI/CD
- Databricks administration
Qualifying Questions:
- Have you worked on Kubernetes
- Do you have PySpark
- Do you have Cloud AWS experience
- Are you able to work with offshore
Job Description:
As a Data Platform Engineer you will be responsible for the design development and maintenance of our high-scale cloud-based data platform treating data as a strategic product. You will lead the implementation of robust optimized data pipelines using PySpark and the Databricks Unified Analytics Platform-leveraging its full ecosystem for Data Engineering Data Science and ML workflows. You will also establish best-in-class DevOps practices using CI/CD and GitHub Actions to ensure automated deployment and reliability. This role demands expertise in large-scale data processing and a commitment to modern scalable data engineering and AWS cloud infrastructure practices.
Key Responsibilities:
- Platform Development: Design build and maintain scalable efficient and reliable ETL/ELT data pipelines to support data ingestion transformation and integration across diverse sources.
- Big Data Implementation: Serve as the subject matter expert for the Databricks environment developing high-performance data transformation logic primarily using PySpark and Python. This includes utilizing Delta Live Tables (DLT) for declarative pipeline construction and ensuring governance through Unity Catalog.
- Cloud Infrastructure Management: Configure maintain and secure the underlying AWS cloud infrastructure required to run the Databricks platform including virtual private clouds (VPCs) network endpoints storage (S3) and cross-account access mechanisms.
- DevOps & Automation (CI/CD): Own and enforce Continuous Integration/Continuous Deployment (CI/CD) practices for the data platform. Specifically design and implement automated deployment workflows using GitHub Actions and modern infrastructure-as-code concepts to deploy Databricks assets (Notebooks Jobs DLT Pipelines and Repos).
- Data Quality & Testing: Design and implement automated unit integration and performance testing frameworks to ensure data quality reliability and compliance with architectural standards.
- Performance Optimization: Optimize data workflows and cluster configurations for performance cost efficiency and scalability across massive datasets.
- Technical Leadership: Provide technical guidance on data principles patterns and best practices (e.g. Medallion Architecture ACID compliance) to promote team capabilities and maturity. This includes leveraging Databricks SQL for high-performance analytics.
- Documentation & Review: Draft and review architectural diagrams design documents and interface specifications to ensure clear communication of data solutions and technical requirements.
Required Qualifications:
- Experience: 5 years of professional experience in Data Engineering focusing on building scalable data platforms and production pipelines.
- Big Data Expertise: Minimum 3 years of hands-on experience developing deploying and optimizing solutions within the Databricks ecosystem.
- Deep expertise required in:
- Delta Lake (ACID transactions time travel optimization).
- Unity Catalog (data governance access control metadata management).
- Delta Live Tables (DLT) (declarative pipeline development).
- Databricks Workspaces Repos and Jobs.
- Databricks SQL for analytics and warehouse operations.
- AWS Infrastructure & Security: Proven hands-on experience (3 years) with core AWS services and infrastructure components including:
- Networking: Configuring and securing VPCs VPC Endpoints Subnets and Route Tables for private connectivity.
- Security & Access: Defining and managing IAM Roles and Policies for secure cross-account access and least privilege access to data.
- Storage: Deep knowledge of Amazon S3 for data lake implementation and governance.
- Programming: Expert proficiency (4 years) in Python for data manipulation scripting and pipeline development.
- Spark & SQL: Deep understanding of distributed computing and extensive experience (3 years) with PySpark and advanced SQL for complex data transformation and querying.
- DevOps & CI/CD: Proven experience (2 years) designing and implementing CI/CD pipelines including proficiency with GitHub Actions or similar tools (e.g. GitLab CI Jenkins) for automated testing and deployment.
- Data Concepts: Full understanding of ETL/ELT Data Warehousing and Data Lake concepts.
- Methodology: Strong grasp of Agile principles (Scrum).
- Version Control: Proficiency with Git for version control.
Preferred Qualifications:
- AWS Data Ecosystem Experience: Familiarity and experience with AWS cloud-native data services such as AWS Glue Amazon Athena Amazon Redshift Amazon RDS and Amazon DynamoDB.
- Knowledge of real-time or near-real-time streaming technologies (e.g. Kafka Spark Structured Streaming).
- Experience in developing feature engineering pipelines for machine learning (ML) consumption.
- Background in performance tuning and capacity planning for large Spark clusters.
View more
View less