Senior Data Engineer - GC/USC only
Washington DC & New York (6-8 Months Contract) possible extensions
About the Role
We are seeking an experienced Senior Data Engineer to design build and optimize the data systems that power our machine learning models segmentation analytics and measurement pipelines.
This hands-on role will focus on building and maintaining our Feature Store creating robust data pipelines to support training and inference workflows and ensuring that our ML and analytics code runs efficiently and at scale across Databricks and AWS environments.
Youll collaborate closely with data scientists ML engineers and product teams to turn analytical concepts into production-ready data and model pipelines that drive personalization audience targeting and performance insights across the business.
What Youll Do
Design build and optimize high-performance data pipelines and feature store components using Databricks (PySpark Delta Lake SQL) and AWS (S3 Lambda Glue Kinesis EMR).
Develop and maintain a centralized Feature Store to manage and serve machine learning features consistently across training and inference environments.
Build data pipelines for audience segmentation measurement and model performance tracking ensuring accuracy and scalability.
Optimize existing code and pipelines for performance cost efficiency and maintainability - reducing compute time improving Spark job performance and minimizing data latency.
Collaborate with data scientists to productionize ML models including model ingestion transformation and deployment pipelines.
Implement CI/CD workflows for data and ML deployments using Databricks Workflows GitHub Actions or similar automation tools.
Develop and enforce data quality lineage and observability frameworks (e.g. Great Expectations Monte Carlo Soda).
Work with cloud infrastructure teams to ensure reliable production environments and efficient resource utilization.
Contribute to code reviews documentation and data architecture design promoting best practices for performance and scalability.
Stay current with Databricks Spark and AWS ecosystem updates to continuously improve platform efficiency.
What Youll Need
Bachelors or Masters degree in Computer Science Engineering or a related field.
7 10 years of experience in data engineering or distributed data systems development.
Deep expertise with Databricks (PySpark Delta Lake SQL) and strong experience with AWS (S3 Glue EMR Kinesis Lambda).
Experience designing and building Feature Stores (Databricks Feature Store Feast or similar).
Proven ability to profile and optimize data processing code including Spark tuning partitioning strategies and efficient data I/O.
Strong programming skills in Python (preferred) or Scala/Java with emphasis on writing performant production-ready code.
Experience with batch and streaming pipelines real-time data processing and large-scale distributed computing.
Familiarity with ML model deployment and monitoring workflows (MLflow SageMaker custom frameworks).
Familiarity with ML model development using libraries such as scikit-learn TensorFlow or PyTorch.
Working knowledge of data quality frameworks CI/CD and infrastructure-as-code.
Excellent problem-solving and communication skills; able to collaborate across technical and product domains.
Preferred Qualifications
Experience with Databricks Unity Catalog Delta Live Tables and MLflow.
Understanding of segmentation targeting and personalization pipelines.
Experience with data observability and monitoring tools (Monte Carlo Databand etc.).
Familiarity with NoSQL or real-time stores (DynamoDB Druid Redis etc.) for feature serving.
Exposure to containerization and orchestration (Docker Kubernetes Airflow Dagster).
Strong understanding of data performance optimization principles - caching partitioning vectorization and adaptive query execution.
Senior Data Engineer - GC/USC only Washington DC & New York (6-8 Months Contract) possible extensions About the Role We are seeking an experienced Senior Data Engineer to design build and optimize the data systems that power our machine learning models segmentation analytics and measurement pipe...
Senior Data Engineer - GC/USC only
Washington DC & New York (6-8 Months Contract) possible extensions
About the Role
We are seeking an experienced Senior Data Engineer to design build and optimize the data systems that power our machine learning models segmentation analytics and measurement pipelines.
This hands-on role will focus on building and maintaining our Feature Store creating robust data pipelines to support training and inference workflows and ensuring that our ML and analytics code runs efficiently and at scale across Databricks and AWS environments.
Youll collaborate closely with data scientists ML engineers and product teams to turn analytical concepts into production-ready data and model pipelines that drive personalization audience targeting and performance insights across the business.
What Youll Do
Design build and optimize high-performance data pipelines and feature store components using Databricks (PySpark Delta Lake SQL) and AWS (S3 Lambda Glue Kinesis EMR).
Develop and maintain a centralized Feature Store to manage and serve machine learning features consistently across training and inference environments.
Build data pipelines for audience segmentation measurement and model performance tracking ensuring accuracy and scalability.
Optimize existing code and pipelines for performance cost efficiency and maintainability - reducing compute time improving Spark job performance and minimizing data latency.
Collaborate with data scientists to productionize ML models including model ingestion transformation and deployment pipelines.
Implement CI/CD workflows for data and ML deployments using Databricks Workflows GitHub Actions or similar automation tools.
Develop and enforce data quality lineage and observability frameworks (e.g. Great Expectations Monte Carlo Soda).
Work with cloud infrastructure teams to ensure reliable production environments and efficient resource utilization.
Contribute to code reviews documentation and data architecture design promoting best practices for performance and scalability.
Stay current with Databricks Spark and AWS ecosystem updates to continuously improve platform efficiency.
What Youll Need
Bachelors or Masters degree in Computer Science Engineering or a related field.
7 10 years of experience in data engineering or distributed data systems development.
Deep expertise with Databricks (PySpark Delta Lake SQL) and strong experience with AWS (S3 Glue EMR Kinesis Lambda).
Experience designing and building Feature Stores (Databricks Feature Store Feast or similar).
Proven ability to profile and optimize data processing code including Spark tuning partitioning strategies and efficient data I/O.
Strong programming skills in Python (preferred) or Scala/Java with emphasis on writing performant production-ready code.
Experience with batch and streaming pipelines real-time data processing and large-scale distributed computing.
Familiarity with ML model deployment and monitoring workflows (MLflow SageMaker custom frameworks).
Familiarity with ML model development using libraries such as scikit-learn TensorFlow or PyTorch.
Working knowledge of data quality frameworks CI/CD and infrastructure-as-code.
Excellent problem-solving and communication skills; able to collaborate across technical and product domains.
Preferred Qualifications
Experience with Databricks Unity Catalog Delta Live Tables and MLflow.
Understanding of segmentation targeting and personalization pipelines.
Experience with data observability and monitoring tools (Monte Carlo Databand etc.).
Familiarity with NoSQL or real-time stores (DynamoDB Druid Redis etc.) for feature serving.
Exposure to containerization and orchestration (Docker Kubernetes Airflow Dagster).
Strong understanding of data performance optimization principles - caching partitioning vectorization and adaptive query execution.
View more
View less