Data Engineer

Datamaxis

Not Interested
Bookmark
Report This Job

profile Job Location:

Bengaluru - India

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

Job Description
Cloud-Native Data Engineering on AWS

  • Strong hands-on expertise in AWS native data services: S3 Glue (Schema Registry Data Catalog) Step Functions Lambda Lake Formation Athena MSK/Kinesis EMR (Spark) SageMaker (inc. Feature Store)
  • Comfort designing and optimizing pipelines for both batch (Step Functions) and streaming (Kinesis/MSK) ingestion.
  • Data Mesh & Distributed Architectures
  • Deep understanding of data mesh principles: including domain-oriented ownership treating data as a product and the use of federated governance models
  • Experience enabling self-service platforms decentralized ingestion and transformation workflows.
  • Data Contracts & Schema Management
  • Advanced knowledge of schema enforcement evolution and validation (preferably AWS Glue Schema Registry/JSON/Avro)
  • Data Transformation & Modelling
  • Proficiency with modern ELT/ETL stack: Spark (EMR) dbt AWS Glue and Python (pandas)


AI/ML Data Enablement

  • Designing and supporting vector stores (OpenSearch) feature stores (SageMaker Feature Store) and integrating with MLOps/data pipelines for AI/semantic search and RAG-type workloads
  • Metadata Catalog and Lineage
  • Familiarity with central cataloging lineage solutions and data discovery (Glue Data Catalog Collibra Atlan Amundsen etc.)
  • Implementing end-to-end lineage auditability and governance processes.
  • Security Compliance and Data Governance
  • Design and implementation of data security: row/column-level security (Lake Formation) KMS encryption role-based access using AuthN/AuthZ standards (JWT/OIDC) GDPR/SOC2/ISO 27001-aligned policies
  • Orchestration & Observability
  • Experience with pipeline orchestration (AWS Step Functions Apache Airflow/MWAA) and monitoring (CloudWatch X-Ray) in large-scale environments.

APIs & Integration

  • API design for both batch and real-time data delivery (REST GraphQL endpoints for AI/reporting/BI consumption)

Job Responsibilities

  • Design build and maintain ETL/ELT pipelines to extract transform and load data from various sources into cloud-based data platforms.
  • Develop and manage data architectures data lakes and data warehouses on AWS (e.g. S3 Redshift Glue Athena).
  • Collaborate with data scientists analysts and business stakeholders to ensure data accessibility quality and security.
  • Optimize performance of large-scale data systems and implement monitoring logging and alerting for pipelines.
  • Work with both structured and unstructured data ensuring reliability and scalability.
  • Implement data governance security and compliance standards.
  • Continuously improve data workflows by leveraging automation CI/CD and Infrastructure-as-Code (IaC)

Job DescriptionCloud-Native Data Engineering on AWSStrong hands-on expertise in AWS native data services: S3 Glue (Schema Registry Data Catalog) Step Functions Lambda Lake Formation Athena MSK/Kinesis EMR (Spark) SageMaker (inc. Feature Store)Comfort designing and optimizing pipelines for both batch...
View more view more

Key Skills

  • Apache Hive
  • S3
  • Hadoop
  • Redshift
  • Spark
  • AWS
  • Apache Pig
  • NoSQL
  • Big Data
  • Data Warehouse
  • Kafka
  • Scala

About Company

Company Logo

Job Summary: Able to provide guidance in all areas relating to information security in order to align and establish information security and strategy with business requirements. Primary Job Responsibilities: Cloud Security and/or Experience is preferred Automation, Scripting, Powe ... View more

View Profile View Profile