Role Overview
- We are looking for a seasoned Data Platform Architect to own the design and delivery of three foundational pillars: a resilient multi-cloud data platform an enterprise MCP (Model Context Protocol) Server layer that connects AI workloads to governed data assets and a high-throughput message bus
- Capable of sustaining millions of events per second across distributed consumers. This is a senior architecture role that combines deep hands-on engineering with cross-functional influence across data AI and infrastructure teams.
MUST HAVEs
- Multi-Cloud Enablement
- MCP Server Foundation
- High-Throughput Message Bus
Key Responsibilities
1. Multi-Cloud Data Platform Enablement
- Architect a cloud-agnostic data platform that operates seamlessly across AWS Azure and GCP with unified identity governance and cost controls
- Define the reference architecture for lakehouse deployments (Delta Lake / Iceberg / Hudi) on each cloud ensuring format interoperability and zero-lock-in data portability
- Design cross-cloud data movement patterns including replication federation and active-active topologies using tools such as Debezium Airbyte and cloud-native transfer services
- Establish a cloud-agnostic Unity Catalog or open metadata layer for consistent lineage access control and discoverability across all cloud zones
- Drive FinOps practices: right-sizing compute storage tiering and reserved capacity planning across cloud providers
MCP Server Foundation
- Architect and build the enterprise MCP Server layer that exposes governed data assets query interfaces and tool APIs to LLM-driven agents and copilots
- Define the MCP resource taxonomy: which data assets surface as Resources which operations
become Tools and which contextual feeds become Prompts
- Implement authentication and authorization at the MCP boundary ensuring AI agents operate
within row-level column-level and dataset-level access policies
- Design the MCP Server for multi-tenancy supporting concurrent agent workloads with rate
limiting audit logging and observability hooks
- Collaborate with AI/ML teams to validate that MCP-served context materially reduces hallucination.
rates and improves retrieval grounding quality
- Produce the MCP Server SDK integration guide for internal engineering teams building AI-
powered applications
High-Throughput Message Bus Architecture
- Design and own the enterprise message bus architecture targeting sustained throughput of 1M events/sec with sub-50ms end-to-end latency at P99
- Evaluate and select the appropriate messaging backbone (Apache Kafka Confluent Platform Redpanda AWS Kinesis or Azure Event Hubs) based on workload profiles
- Define partitioning strategies topic compaction policies retention tiers and tiered storage configurations aligned to IoT telemetry CDC and operational event patterns
- Architect Schema Registry governance including schema evolution contracts (Avro / Protobuf / JSON Schema) and compatibility enforcement pipelines
- Design consumer group topologies for stream processing frameworks (Flink Spark Structured Streaming Delta Live Tables) and ensure back pressure and offset management are production-grade
- Integrate the message bus with the multi-cloud lakehouse as a bronze ingestion layer enforcing idempotency and exactly-once delivery guarantees
Platform Governance & Engineering Excellence
- Define and enforce platform-wide standards: naming conventions tagging taxonomy SLA tiers DR objectives and run-book templates
- Champion Infrastructure-as-Code practices across Terraform Pulumi or Bicep for all cloud resources and data platform components
- Lead architecture review boards (ARBs) and own the technical decision log (ADRs) for all major platform choices
- Mentor senior data engineers and serve as escalation point for platform-level production incidents
Requirements
Required Qualifications
- 10 years in data engineering with at least 3 years in platform or solutions architecture roles
- Hands-on experience architecting production lakehouse platforms on two or more of: AWS (S3
- Glue Athena EMR Kinesis) Azure (ADLS Gen2 Databricks Event Hubs Synapse) GCP (BigQuery Dataflow Pub/Sub)
- Deep expertise in Apache Kafka or equivalent message bus: cluster sizing partition leadership consumer lag management and MirrorMaker 2 / replication topologies
- Strong command of open table formats: Delta Lake Apache Iceberg or Apache Hudi including time travel merge-on-read vs. copy-on-write trade-offs and OPTIMIZE / VACUUM strategies
- Proficiency in Python and PySpark for platform automation ingestion framework development and schema validation pipelines
- Demonstrated experience with metadata management: Apache Atlas Unity Catalog DataHub or equivalent open metadata solutions
- Familiarity with MCP specification or equivalent AI tool-use protocols; experience building or integrating API layers consumed by LLM agents is a strong plus
- Infrastructure-as-Code fluency (Terraform Pulumi or equivalent) and CI/CD pipeline design for data platform deployments
- Strong written communication skills: ability to produce architecture decision records RFP responses and client-facing implementation guides
Preferred Qualifications
- Experience with Delta Live Tables (DLT) in Databricks including CDC pipeline design and Liquid Clustering optimization
- Exposure to vector databases (Pinecone Weaviate pgvector) and RAG pipeline architecture for grounding LLMs in enterprise data
- Familiarity with Redpanda or Confluent Cloud as managed Kafka alternatives and their cost/performance trade-offs at scale
- Knowledge of data mesh operating models: domain ownership data products and federated governance
- Experience in regulated industries (energy manufacturing IoT telemetry) where data quality auditability and retention policies are mission-critical
- Cloud certifications: AWS Data Analytics Specialty Azure Data Engineer Associate GCP Professional Data Engineer or Databricks Certified Data Engineer Professional
- Prior consulting or multi-client engagement experience; comfort navigating multiple concurrent stakeholder environments
Required Skills:
Data Platform Architect
Required Education:
Masters
Role Overview We are looking for a seasoned Data Platform Architect to own the design and delivery of three foundational pillars: a resilient multi-cloud data platform an enterprise MCP (Model Context Protocol) Server layer that connects AI workloads to governed data assets and a high-throughput mes...
Role Overview
- We are looking for a seasoned Data Platform Architect to own the design and delivery of three foundational pillars: a resilient multi-cloud data platform an enterprise MCP (Model Context Protocol) Server layer that connects AI workloads to governed data assets and a high-throughput message bus
- Capable of sustaining millions of events per second across distributed consumers. This is a senior architecture role that combines deep hands-on engineering with cross-functional influence across data AI and infrastructure teams.
MUST HAVEs
- Multi-Cloud Enablement
- MCP Server Foundation
- High-Throughput Message Bus
Key Responsibilities
1. Multi-Cloud Data Platform Enablement
- Architect a cloud-agnostic data platform that operates seamlessly across AWS Azure and GCP with unified identity governance and cost controls
- Define the reference architecture for lakehouse deployments (Delta Lake / Iceberg / Hudi) on each cloud ensuring format interoperability and zero-lock-in data portability
- Design cross-cloud data movement patterns including replication federation and active-active topologies using tools such as Debezium Airbyte and cloud-native transfer services
- Establish a cloud-agnostic Unity Catalog or open metadata layer for consistent lineage access control and discoverability across all cloud zones
- Drive FinOps practices: right-sizing compute storage tiering and reserved capacity planning across cloud providers
MCP Server Foundation
- Architect and build the enterprise MCP Server layer that exposes governed data assets query interfaces and tool APIs to LLM-driven agents and copilots
- Define the MCP resource taxonomy: which data assets surface as Resources which operations
become Tools and which contextual feeds become Prompts
- Implement authentication and authorization at the MCP boundary ensuring AI agents operate
within row-level column-level and dataset-level access policies
- Design the MCP Server for multi-tenancy supporting concurrent agent workloads with rate
limiting audit logging and observability hooks
- Collaborate with AI/ML teams to validate that MCP-served context materially reduces hallucination.
rates and improves retrieval grounding quality
- Produce the MCP Server SDK integration guide for internal engineering teams building AI-
powered applications
High-Throughput Message Bus Architecture
- Design and own the enterprise message bus architecture targeting sustained throughput of 1M events/sec with sub-50ms end-to-end latency at P99
- Evaluate and select the appropriate messaging backbone (Apache Kafka Confluent Platform Redpanda AWS Kinesis or Azure Event Hubs) based on workload profiles
- Define partitioning strategies topic compaction policies retention tiers and tiered storage configurations aligned to IoT telemetry CDC and operational event patterns
- Architect Schema Registry governance including schema evolution contracts (Avro / Protobuf / JSON Schema) and compatibility enforcement pipelines
- Design consumer group topologies for stream processing frameworks (Flink Spark Structured Streaming Delta Live Tables) and ensure back pressure and offset management are production-grade
- Integrate the message bus with the multi-cloud lakehouse as a bronze ingestion layer enforcing idempotency and exactly-once delivery guarantees
Platform Governance & Engineering Excellence
- Define and enforce platform-wide standards: naming conventions tagging taxonomy SLA tiers DR objectives and run-book templates
- Champion Infrastructure-as-Code practices across Terraform Pulumi or Bicep for all cloud resources and data platform components
- Lead architecture review boards (ARBs) and own the technical decision log (ADRs) for all major platform choices
- Mentor senior data engineers and serve as escalation point for platform-level production incidents
Requirements
Required Qualifications
- 10 years in data engineering with at least 3 years in platform or solutions architecture roles
- Hands-on experience architecting production lakehouse platforms on two or more of: AWS (S3
- Glue Athena EMR Kinesis) Azure (ADLS Gen2 Databricks Event Hubs Synapse) GCP (BigQuery Dataflow Pub/Sub)
- Deep expertise in Apache Kafka or equivalent message bus: cluster sizing partition leadership consumer lag management and MirrorMaker 2 / replication topologies
- Strong command of open table formats: Delta Lake Apache Iceberg or Apache Hudi including time travel merge-on-read vs. copy-on-write trade-offs and OPTIMIZE / VACUUM strategies
- Proficiency in Python and PySpark for platform automation ingestion framework development and schema validation pipelines
- Demonstrated experience with metadata management: Apache Atlas Unity Catalog DataHub or equivalent open metadata solutions
- Familiarity with MCP specification or equivalent AI tool-use protocols; experience building or integrating API layers consumed by LLM agents is a strong plus
- Infrastructure-as-Code fluency (Terraform Pulumi or equivalent) and CI/CD pipeline design for data platform deployments
- Strong written communication skills: ability to produce architecture decision records RFP responses and client-facing implementation guides
Preferred Qualifications
- Experience with Delta Live Tables (DLT) in Databricks including CDC pipeline design and Liquid Clustering optimization
- Exposure to vector databases (Pinecone Weaviate pgvector) and RAG pipeline architecture for grounding LLMs in enterprise data
- Familiarity with Redpanda or Confluent Cloud as managed Kafka alternatives and their cost/performance trade-offs at scale
- Knowledge of data mesh operating models: domain ownership data products and federated governance
- Experience in regulated industries (energy manufacturing IoT telemetry) where data quality auditability and retention policies are mission-critical
- Cloud certifications: AWS Data Analytics Specialty Azure Data Engineer Associate GCP Professional Data Engineer or Databricks Certified Data Engineer Professional
- Prior consulting or multi-client engagement experience; comfort navigating multiple concurrent stakeholder environments
Required Skills:
Data Platform Architect
Required Education:
Masters
View more
View less