Job Description
Cloud-Native Data Engineering on AWS
- Strong hands-on expertise in AWS native data services: S3 Glue (Schema Registry Data Catalog) Step Functions Lambda Lake Formation Athena MSK/Kinesis EMR (Spark) SageMaker (inc. Feature Store)
- Comfort designing and optimizing pipelines for both batch (Step Functions) and streaming (Kinesis/MSK) ingestion.
- Data Mesh & Distributed Architectures
- Deep understanding of data mesh principles: including domain-oriented ownership treating data as a product and the use of federated governance models
- Experience enabling self-service platforms decentralized ingestion and transformation workflows.
- Data Contracts & Schema Management
- Advanced knowledge of schema enforcement evolution and validation (preferably AWS Glue Schema Registry/JSON/Avro)
- Data Transformation & Modelling
- Proficiency with modern ELT/ETL stack: Spark (EMR) dbt AWS Glue and Python (pandas)
AI/ML Data Enablement
- Designing and supporting vector stores (OpenSearch) feature stores (SageMaker Feature Store) and integrating with MLOps/data pipelines for AI/semantic search and RAG-type workloads
- Metadata Catalog and Lineage
- Familiarity with central cataloging lineage solutions and data discovery (Glue Data Catalog Collibra Atlan Amundsen etc.)
- Implementing end-to-end lineage auditability and governance processes.
- Security Compliance and Data Governance
- Design and implementation of data security: row/column-level security (Lake Formation) KMS encryption role-based access using AuthN/AuthZ standards (JWT/OIDC) GDPR/SOC2/ISO 27001-aligned policies
- Orchestration & Observability
- Experience with pipeline orchestration (AWS Step Functions Apache Airflow/MWAA) and monitoring (CloudWatch X-Ray) in large-scale environments.
APIs & Integration
- API design for both batch and real-time data delivery (REST GraphQL endpoints for AI/reporting/BI consumption)
Job Responsibilities
- Design build and maintain ETL/ELT pipelines to extract transform and load data from various sources into cloud-based data platforms.
- Develop and manage data architectures data lakes and data warehouses on AWS (e.g. S3 Redshift Glue Athena).
- Collaborate with data scientists analysts and business stakeholders to ensure data accessibility quality and security.
- Optimize performance of large-scale data systems and implement monitoring logging and alerting for pipelines.
- Work with both structured and unstructured data ensuring reliability and scalability.
- Implement data governance security and compliance standards.
- Continuously improve data workflows by leveraging automation CI/CD and Infrastructure-as-Code (IaC)
Job DescriptionCloud-Native Data Engineering on AWSStrong hands-on expertise in AWS native data services: S3 Glue (Schema Registry Data Catalog) Step Functions Lambda Lake Formation Athena MSK/Kinesis EMR (Spark) SageMaker (inc. Feature Store)Comfort designing and optimizing pipelines for both batch...
Job Description
Cloud-Native Data Engineering on AWS
- Strong hands-on expertise in AWS native data services: S3 Glue (Schema Registry Data Catalog) Step Functions Lambda Lake Formation Athena MSK/Kinesis EMR (Spark) SageMaker (inc. Feature Store)
- Comfort designing and optimizing pipelines for both batch (Step Functions) and streaming (Kinesis/MSK) ingestion.
- Data Mesh & Distributed Architectures
- Deep understanding of data mesh principles: including domain-oriented ownership treating data as a product and the use of federated governance models
- Experience enabling self-service platforms decentralized ingestion and transformation workflows.
- Data Contracts & Schema Management
- Advanced knowledge of schema enforcement evolution and validation (preferably AWS Glue Schema Registry/JSON/Avro)
- Data Transformation & Modelling
- Proficiency with modern ELT/ETL stack: Spark (EMR) dbt AWS Glue and Python (pandas)
AI/ML Data Enablement
- Designing and supporting vector stores (OpenSearch) feature stores (SageMaker Feature Store) and integrating with MLOps/data pipelines for AI/semantic search and RAG-type workloads
- Metadata Catalog and Lineage
- Familiarity with central cataloging lineage solutions and data discovery (Glue Data Catalog Collibra Atlan Amundsen etc.)
- Implementing end-to-end lineage auditability and governance processes.
- Security Compliance and Data Governance
- Design and implementation of data security: row/column-level security (Lake Formation) KMS encryption role-based access using AuthN/AuthZ standards (JWT/OIDC) GDPR/SOC2/ISO 27001-aligned policies
- Orchestration & Observability
- Experience with pipeline orchestration (AWS Step Functions Apache Airflow/MWAA) and monitoring (CloudWatch X-Ray) in large-scale environments.
APIs & Integration
- API design for both batch and real-time data delivery (REST GraphQL endpoints for AI/reporting/BI consumption)
Job Responsibilities
- Design build and maintain ETL/ELT pipelines to extract transform and load data from various sources into cloud-based data platforms.
- Develop and manage data architectures data lakes and data warehouses on AWS (e.g. S3 Redshift Glue Athena).
- Collaborate with data scientists analysts and business stakeholders to ensure data accessibility quality and security.
- Optimize performance of large-scale data systems and implement monitoring logging and alerting for pipelines.
- Work with both structured and unstructured data ensuring reliability and scalability.
- Implement data governance security and compliance standards.
- Continuously improve data workflows by leveraging automation CI/CD and Infrastructure-as-Code (IaC)
View more
View less