Senior Data Engineer – Feature Store, ML Platform & Performance Optimization

Techtriad Team Inc

Not Interested
Bookmark
Report This Job

profile Job Location:

Washington, AR - USA

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

Senior Data Engineer - GC/USC only

Washington DC & New York (6-8 Months Contract) possible extensions

About the Role

We are seeking an experienced Senior Data Engineer to design build and optimize the data systems that power our machine learning models segmentation analytics and measurement pipelines.

This hands-on role will focus on building and maintaining our Feature Store creating robust data pipelines to support training and inference workflows and ensuring that our ML and analytics code runs efficiently and at scale across Databricks and AWS environments.

Youll collaborate closely with data scientists ML engineers and product teams to turn analytical concepts into production-ready data and model pipelines that drive personalization audience targeting and performance insights across the business.

What Youll Do

Design build and optimize high-performance data pipelines and feature store components using Databricks (PySpark Delta Lake SQL) and AWS (S3 Lambda Glue Kinesis EMR).

Develop and maintain a centralized Feature Store to manage and serve machine learning features consistently across training and inference environments.

Build data pipelines for audience segmentation measurement and model performance tracking ensuring accuracy and scalability.

Optimize existing code and pipelines for performance cost efficiency and maintainability - reducing compute time improving Spark job performance and minimizing data latency.

Collaborate with data scientists to productionize ML models including model ingestion transformation and deployment pipelines.

Implement CI/CD workflows for data and ML deployments using Databricks Workflows GitHub Actions or similar automation tools.

Develop and enforce data quality lineage and observability frameworks (e.g. Great Expectations Monte Carlo Soda).

Work with cloud infrastructure teams to ensure reliable production environments and efficient resource utilization.

Contribute to code reviews documentation and data architecture design promoting best practices for performance and scalability.

Stay current with Databricks Spark and AWS ecosystem updates to continuously improve platform efficiency.

What Youll Need

Bachelors or Masters degree in Computer Science Engineering or a related field.

7 10 years of experience in data engineering or distributed data systems development.

Deep expertise with Databricks (PySpark Delta Lake SQL) and strong experience with AWS (S3 Glue EMR Kinesis Lambda).

Experience designing and building Feature Stores (Databricks Feature Store Feast or similar).

Proven ability to profile and optimize data processing code including Spark tuning partitioning strategies and efficient data I/O.

Strong programming skills in Python (preferred) or Scala/Java with emphasis on writing performant production-ready code.

Experience with batch and streaming pipelines real-time data processing and large-scale distributed computing.

Familiarity with ML model deployment and monitoring workflows (MLflow SageMaker custom frameworks).

Familiarity with ML model development using libraries such as scikit-learn TensorFlow or PyTorch.

Working knowledge of data quality frameworks CI/CD and infrastructure-as-code.

Excellent problem-solving and communication skills; able to collaborate across technical and product domains.

Preferred Qualifications

Experience with Databricks Unity Catalog Delta Live Tables and MLflow.

Understanding of segmentation targeting and personalization pipelines.

Experience with data observability and monitoring tools (Monte Carlo Databand etc.).

Familiarity with NoSQL or real-time stores (DynamoDB Druid Redis etc.) for feature serving.

Exposure to containerization and orchestration (Docker Kubernetes Airflow Dagster).

Strong understanding of data performance optimization principles - caching partitioning vectorization and adaptive query execution.

Senior Data Engineer - GC/USC only Washington DC & New York (6-8 Months Contract) possible extensions About the Role We are seeking an experienced Senior Data Engineer to design build and optimize the data systems that power our machine learning models segmentation analytics and measurement pipe...
View more view more

Key Skills

  • JProfiler
  • Splunk
  • Performance Testing
  • Fiddler
  • Apache
  • HP Performance Center
  • LoadRunner
  • New Relic
  • Scalability
  • J2EE
  • Java
  • Scripting