About the Role
Our client is looking for a Staff Data Engineer to be the steward of our data layer ensuring that our AI and ML models have clean structured and highquality data. This is an opportunity for a highperforming engineer to take ownership of our data platformdesigning and building scalable ingestion transformation and storage solutions for a fastgrowing AIdriven sales intelligence product.
Youll build and optimize data pipelines that ingest transform and correlate structured and unstructured data from multiple sources (CRM public datasets web sing). Youll work closely with ML and AI teams to ensure that our models are powered by the right data at the right time.
Why This Role
- High ownership Youll be responsible for designing maintaining and evolving our data platform.
- Be the expert Youll shape how data is structured transformed and optimized for ML models.
- Direct impact Your work will power AIdriven sales recommendations for enterprise users.
Responsibilities
- Own and maintain scalable data pipelines using Python SQL Airflow and Spark (Databricks).
- Develop data ingestion strategies using APIs Airbyte and web sing.
- Transform and clean data for ML models using Databricks (or Sparkbased systems).
- Optimize storage layers using a Medallion architecture (Bronze/Silver/Gold) approach.
- Ensure data quality governance and observability across all pipelines.
- Collaborate with ML AI and backend teams to integrate data into AI models.
- Continuously refine and improve how data is structured stored and served.
What Were Looking For
- 5 years of experience in data engineering with strong Python & SQL expertise.
- Handson experience with Airflow ETL pipelines and Spark (Databricks preferred).
- Experience integrating structured & unstructured data from APIs CRMs and web sources.
- Ability to own and scale data infrastructure in a fastgrowing AIdriven company.
- Strong problemsolving skills and a desire to improve how data is structured for ML.
Bonus Points
- Exposure to Golang for API development (not required but helpful).
- Experience with MLOps (feature stores model data versioning SageMaker ClearML).
- Familiarity with Terraform Kubernetes or data pipeline automation.
- Experience in database design to support customerfacing access patterns