Data Platform Engineer Cloud Ops + Data Ops R1012521

YASMESOFT INC

Not Interested
Bookmark
Report This Job

profile Job Location:

Plano, TX - USA

profile Monthly Salary: Not Disclosed
Posted on: 1 hour ago
Vacancies: 1 Vacancy

Job Summary

Industry Group: Automotive.

Job Title : Data Platform Engineer Cloud Ops Data Ops - R1012521

Location : Plano TX (Local to Dallas area in client office 3days/wk.)
Duration : 12 Months Contract (Potential for extension)

Pay Rate : $70 - $75

Custom Skill Requirements:

  • Data Platform Engineer: Cloud Ops Data Ops
  • PySpark
  • AWS
  • Cloud
  • DevOps: CI/CD
  • Databricks administration

Qualifying Questions:

  1. Have you worked on Kubernetes
  2. Do you have PySpark
  3. Do you have Cloud AWS experience
  4. Are you able to work with offshore

Job Description:

As a Data Platform Engineer you will be responsible for the design development and maintenance of our high-scale cloud-based data platform treating data as a strategic product. You will lead the implementation of robust optimized data pipelines using PySpark and the Databricks Unified Analytics Platform-leveraging its full ecosystem for Data Engineering Data Science and ML workflows. You will also establish best-in-class DevOps practices using CI/CD and GitHub Actions to ensure automated deployment and reliability. This role demands expertise in large-scale data processing and a commitment to modern scalable data engineering and AWS cloud infrastructure practices.

Key Responsibilities:

  1. Platform Development: Design build and maintain scalable efficient and reliable ETL/ELT data pipelines to support data ingestion transformation and integration across diverse sources.
  2. Big Data Implementation: Serve as the subject matter expert for the Databricks environment developing high-performance data transformation logic primarily using PySpark and Python. This includes utilizing Delta Live Tables (DLT) for declarative pipeline construction and ensuring governance through Unity Catalog.
  3. Cloud Infrastructure Management: Configure maintain and secure the underlying AWS cloud infrastructure required to run the Databricks platform including virtual private clouds (VPCs) network endpoints storage (S3) and cross-account access mechanisms.
  4. DevOps & Automation (CI/CD): Own and enforce Continuous Integration/Continuous Deployment (CI/CD) practices for the data platform. Specifically design and implement automated deployment workflows using GitHub Actions and modern infrastructure-as-code concepts to deploy Databricks assets (Notebooks Jobs DLT Pipelines and Repos).
  5. Data Quality & Testing: Design and implement automated unit integration and performance testing frameworks to ensure data quality reliability and compliance with architectural standards.
  6. Performance Optimization: Optimize data workflows and cluster configurations for performance cost efficiency and scalability across massive datasets.
  7. Technical Leadership: Provide technical guidance on data principles patterns and best practices (e.g. Medallion Architecture ACID compliance) to promote team capabilities and maturity. This includes leveraging Databricks SQL for high-performance analytics.
  8. Documentation & Review: Draft and review architectural diagrams design documents and interface specifications to ensure clear communication of data solutions and technical requirements.

Required Qualifications:

  • Experience: 5 years of professional experience in Data Engineering focusing on building scalable data platforms and production pipelines.
  • Big Data Expertise: Minimum 3 years of hands-on experience developing deploying and optimizing solutions within the Databricks ecosystem.
  • Deep expertise required in:
  • Delta Lake (ACID transactions time travel optimization).
  • Unity Catalog (data governance access control metadata management).
  • Delta Live Tables (DLT) (declarative pipeline development).
  • Databricks Workspaces Repos and Jobs.
  • Databricks SQL for analytics and warehouse operations.
  • AWS Infrastructure & Security: Proven hands-on experience (3 years) with core AWS services and infrastructure components including:
  • Networking: Configuring and securing VPCs VPC Endpoints Subnets and Route Tables for private connectivity.
  • Security & Access: Defining and managing IAM Roles and Policies for secure cross-account access and least privilege access to data.
  • Storage: Deep knowledge of Amazon S3 for data lake implementation and governance.
  • Programming: Expert proficiency (4 years) in Python for data manipulation scripting and pipeline development.
  • Spark & SQL: Deep understanding of distributed computing and extensive experience (3 years) with PySpark and advanced SQL for complex data transformation and querying.
  • DevOps & CI/CD: Proven experience (2 years) designing and implementing CI/CD pipelines including proficiency with GitHub Actions or similar tools (e.g. GitLab CI Jenkins) for automated testing and deployment.
  • Data Concepts: Full understanding of ETL/ELT Data Warehousing and Data Lake concepts.
  • Methodology: Strong grasp of Agile principles (Scrum).
  • Version Control: Proficiency with Git for version control.

Preferred Qualifications:

  • AWS Data Ecosystem Experience: Familiarity and experience with AWS cloud-native data services such as AWS Glue Amazon Athena Amazon Redshift Amazon RDS and Amazon DynamoDB.
  • Knowledge of real-time or near-real-time streaming technologies (e.g. Kafka Spark Structured Streaming).
  • Experience in developing feature engineering pipelines for machine learning (ML) consumption.
  • Background in performance tuning and capacity planning for large Spark clusters.
Industry Group: Automotive. Job Title : Data Platform Engineer Cloud Ops Data Ops - R1012521 Location : Plano TX (Local to Dallas area in client office 3days/wk.) Duration : 12 Months Contract (Potential for extension) Pay Rate : $70 - $75 Custom Skill Requirements: Data Platform Engineer: C...
View more view more

Key Skills

  • Apache Hive
  • S3
  • Hadoop
  • Redshift
  • Spark
  • AWS
  • Apache Pig
  • NoSQL
  • Big Data
  • Data Warehouse
  • Kafka
  • Scala