JD-
Data Platform & Architecture Design and evolve cloud-native data lakes / warehouses (e.g. Snowflake Databricks BigQuery).
Establish scalable batch & streaming pipelines using Spark/Flink Kafka Airflow/Dagster and dbt.
Implement robust data-quality catalog and governance frameworks (e.g. Great Expectations Unity Catalog).
MLOps & Model Lifecycle Build automated CI/CD pipelines for ML (MLflow Kubeflow SageMaker Vertex AI).
Set up feature stores model registries and canary rollout processes.
Create monitoring & alerting for drift bias and performance (Prometheus Evidently Arize).
Leadership & Delivery Recruit coach and promote a high-performing team of data engineers ML engineers and DevOps specialists.
Drive quarterly OKRs roadmaps and architectural review boards.
Manage budgets vendor contracts and cloud cost optimization.
Security Compliance & Governance Enforce IAM data-encryption and least-privilege practices.
Ensure adherence to GDPR PDPA HIPAA or other relevant regulations.
Champion reproducibility and auditability across data and ML assets.
Innovation & Thought Leadership Evaluate emerging paradigms like
data mesh vector databases LLMOps and GenAI for business fit.
Publish best-practice playbooks and present at internal tech forums or external meet-ups.
Required Qualifications - 8 years combined experience in data engineering software engineering or ML infrastructure with 3 years leading teams.
- Deep proficiency with Python/Scala/SQL and modern data processing frameworks (Spark Flink).
- Hands-on with Docker Kubernetes Terraform CI/CD (GitHub Actions Jenkins).
- Proven record of shipping and operating ML models in production at scale.
- Solid grasp of distributed-system design data modeling and micro-service architectures.
- Excellent stakeholder management and communication skills.
Preferred / Bonus Points - Experience in GenAI or LLM pipelines vector similarity search (FAISS Pinecone Weaviate).
- Multi-cloud (AWS GCP Azure) certification or FinOps expertise.
- Contributions to open-source data or MLOps projects.
- Familiarity with privacy-preserving ML (federated learning differential privacy).
Success Metrics (First 12 Months) - Reduce model deployment lead-time from commit production to < 24 hours.
- Achieve 99.9 % uptime for core data pipelines.
- Launch unified feature store serving at least 3 flagship ML products.
- Hire and onboard 4 engineers with < 90-day ramp-up.
JD- Data Platform & Architecture Design and evolve cloud-native data lakes / warehouses (e.g. Snowflake Databricks BigQuery). Establish scalable batch & streaming pipelines using Spark/Flink Kafka Airflow/Dagster and dbt. Implement robust data-quality catalog and governance frameworks ...
JD-
Data Platform & Architecture Design and evolve cloud-native data lakes / warehouses (e.g. Snowflake Databricks BigQuery).
Establish scalable batch & streaming pipelines using Spark/Flink Kafka Airflow/Dagster and dbt.
Implement robust data-quality catalog and governance frameworks (e.g. Great Expectations Unity Catalog).
MLOps & Model Lifecycle Build automated CI/CD pipelines for ML (MLflow Kubeflow SageMaker Vertex AI).
Set up feature stores model registries and canary rollout processes.
Create monitoring & alerting for drift bias and performance (Prometheus Evidently Arize).
Leadership & Delivery Recruit coach and promote a high-performing team of data engineers ML engineers and DevOps specialists.
Drive quarterly OKRs roadmaps and architectural review boards.
Manage budgets vendor contracts and cloud cost optimization.
Security Compliance & Governance Enforce IAM data-encryption and least-privilege practices.
Ensure adherence to GDPR PDPA HIPAA or other relevant regulations.
Champion reproducibility and auditability across data and ML assets.
Innovation & Thought Leadership Evaluate emerging paradigms like
data mesh vector databases LLMOps and GenAI for business fit.
Publish best-practice playbooks and present at internal tech forums or external meet-ups.
Required Qualifications - 8 years combined experience in data engineering software engineering or ML infrastructure with 3 years leading teams.
- Deep proficiency with Python/Scala/SQL and modern data processing frameworks (Spark Flink).
- Hands-on with Docker Kubernetes Terraform CI/CD (GitHub Actions Jenkins).
- Proven record of shipping and operating ML models in production at scale.
- Solid grasp of distributed-system design data modeling and micro-service architectures.
- Excellent stakeholder management and communication skills.
Preferred / Bonus Points - Experience in GenAI or LLM pipelines vector similarity search (FAISS Pinecone Weaviate).
- Multi-cloud (AWS GCP Azure) certification or FinOps expertise.
- Contributions to open-source data or MLOps projects.
- Familiarity with privacy-preserving ML (federated learning differential privacy).
Success Metrics (First 12 Months) - Reduce model deployment lead-time from commit production to < 24 hours.
- Achieve 99.9 % uptime for core data pipelines.
- Launch unified feature store serving at least 3 flagship ML products.
- Hire and onboard 4 engineers with < 90-day ramp-up.
View more
View less