Job Title: Associate Data Architect Master Data Management (MDM)
Location:
Pune - Hybrid
Experience:
10 years of experience in Data Architecture Data Engineering/Integration with strong exposure into Data Modelling and Database (RDBMS) Management.
About the Role
We are seeking an Associate Data/Database Architect to join our core product architecture team building an enterprise-grade multi-domain Master Data Management (MDM) product platform.
You will play a key role in optimizing and extending the MDM data model implementing efficient data ingestion and entity resolution mechanisms and ensuring the system supports multiple domains such as Party (Individual/Organization) Product Location Policy and Relationship in a cloud-native and scalable manner.
Key Responsibilities
Data Modeling & Architecture
- Enhance and extend the existing Party-based data model into a multi-domain MDM schema (Party Product Location Relationship Policy etc.).
- Design and maintain canonical data models and staging-to-core mappings for multiple source systems.
- Implement auditability lineage and soft-delete frameworks within the MDM data model.
- Contribute to the creation of golden records trust scores match/merge logic and data survivorship rules.
- Ensure the model supports real-time and batch data mastering across multiple domains.
Data Engineering & Integration
- Help support to optimize data ingestion and ETL/ELT pipeline using Python PySpark SQL and/or Informatica (or equivalent tools).
- Design and implement data validation profiling and quality checks to ensure consistent master data.
- Work on data harmonization schema mapping and standardization across multiple source systems.
- Help build efficient ETL mappings from canonical staging layers to MDM core data models in PostgreSQL.
- Develop REST APIs or streaming pipelines (Kafka/Spark) for real-time data processing and entity resolution.
Cloud & Platform Engineering
- Implement and optimize data pipelines on AWS or Azure using native services (e.g. AWS Glue Lambda S3 Redshift Azure Data Factory Synapse Data Lake).
- Deploy and manage data pipelines and databases following cloud-native cost-effective and scalable design principles.
- Collaborate with DevOps teams for CI/CD infrastructure-as-code data pipeline and database deployment/migration automation.
Governance Security & Compliance
- Implement data lineage versioning and stewardship processes.
- Ensure compliance with data privacy and security standards (GDPR HIPAA etc.).
- Partner with Data Governance teams to define data ownership data standards and stewardship workflows.
Requirements
Technical Skills Required
Core Skills
- Data Modelling: Expert-level in Relational (3NF) and Dimensional (Star/Snowflake) modelling; hands-on in Party data model multi-domain MDM and canonical models.
- Database: PostgreSQL (preferred) or any enterprise RDBMS.
- ER Modelling Tool Erwin/ERStudio Database Markup Language (DBML).
- ETL / Data Integration: Informatica Python PySpark SQL or similar tools.
- Cloud Platforms: AWS or Azure.
- Programming: Advanced SQL Python PySpark and/or UNIX/Linux scripting.
- Data Quality & Governance: Familiarity with data quality rules profiling match/merge and entity resolution.
- DevOps - Version Control & CI/CD: Git Azure DevOps Jenkins Terraform Redgate Flyway (preferred)
Database Design & Optimization (PostgreSQL)
- Design and maintain normalized and denormalized models using advanced features (schemas partitions views CTEs JSONB arrays).
- Build and optimize complex SQL queries materialized views and data marts for performance and scalability.
- Tune RDBMS (PostgreSQL) performance indexes query plans vacuum/analyze statistics parallelism and connection management.
- Leverage RDBMS (PostgreSQL) extensions such as:
- pgtrgm for fuzzy matching and probabilistic search.
- fuzzystrmatch pgvector for semantic similarity and name matching.
- hstore jsonb for flexible attribute storage.
- Implement RBAC row-level security partitioning and logical replication for scalable MDM deployment.
- Work with stored procedures functions and triggers for data quality checks and lineage automation.
- Implement HA/DR backup/restore database-level encryption (at-rest in-transit) column-level encryption for PII/PHI data.
Good to Have
- Knowledge of Master Data Management (MDM) - Customer Product etc.
- Experience with graph databases like Neo4j for relationship and lineage tracking.
- Knowledge of probabilistic and deterministic matching ML-based entity resolution or AI-driven data mastering.
- Experience in data cataloging data lineage tools or metadata management platforms.
- Familiarity with data security frameworks and Well-Architected Framework principles.
Soft Skills
- Strong analytical conceptual and problem-solving skills.
- Ability to collaborate in a cross-functional agile environment.
- Excellent communication and documentation skills.
- Self-driven proactive and capable of working with minimal supervision.
- Strong desire to innovate and build scalable reusable data frameworks.
Education
- Bachelors or masters degree in computer science Information Technology or related discipline.
- Certifications in AWS/Azure Informatica or Data Architecture are a plus.
Benefits
Why Join Us
- Be part of a cutting-edge MDM product initiative blending data architecture engineering AI/ML and cloud-native design.
- Opportunity to shape the next-generation data mastering framework for multiple industry domains.
- Gain deep exposure to data mastering lineage probabilistic search and graph-based relationship management.
- Competitive compensation flexible working and a technology-driven culture.
Required Skills:
Requirements Proficiency in Python programming. Advanced knowledge in mathematics and algorithm development. Experience in developing machine learning and deep learning models. Strong understanding of neural network architectures with emphasis on GenAI and LLMs. Skilled in data processing and visualization. Experienced in natural language processing. Knowledgeable in AI/ML deployment DevOps practices and cloud -depth understanding of AI security principles and practices.
Job Title: Associate Data Architect Master Data Management (MDM) Location: Pune - Hybrid Experience: 10 years of experience in Data Architecture Data Engineering/Integration with strong exposure into Data Modelling and Database (RDBMS) Management. About the Role We are seeking an Associate Data/...
Job Title: Associate Data Architect Master Data Management (MDM)
Location:
Pune - Hybrid
Experience:
10 years of experience in Data Architecture Data Engineering/Integration with strong exposure into Data Modelling and Database (RDBMS) Management.
About the Role
We are seeking an Associate Data/Database Architect to join our core product architecture team building an enterprise-grade multi-domain Master Data Management (MDM) product platform.
You will play a key role in optimizing and extending the MDM data model implementing efficient data ingestion and entity resolution mechanisms and ensuring the system supports multiple domains such as Party (Individual/Organization) Product Location Policy and Relationship in a cloud-native and scalable manner.
Key Responsibilities
Data Modeling & Architecture
- Enhance and extend the existing Party-based data model into a multi-domain MDM schema (Party Product Location Relationship Policy etc.).
- Design and maintain canonical data models and staging-to-core mappings for multiple source systems.
- Implement auditability lineage and soft-delete frameworks within the MDM data model.
- Contribute to the creation of golden records trust scores match/merge logic and data survivorship rules.
- Ensure the model supports real-time and batch data mastering across multiple domains.
Data Engineering & Integration
- Help support to optimize data ingestion and ETL/ELT pipeline using Python PySpark SQL and/or Informatica (or equivalent tools).
- Design and implement data validation profiling and quality checks to ensure consistent master data.
- Work on data harmonization schema mapping and standardization across multiple source systems.
- Help build efficient ETL mappings from canonical staging layers to MDM core data models in PostgreSQL.
- Develop REST APIs or streaming pipelines (Kafka/Spark) for real-time data processing and entity resolution.
Cloud & Platform Engineering
- Implement and optimize data pipelines on AWS or Azure using native services (e.g. AWS Glue Lambda S3 Redshift Azure Data Factory Synapse Data Lake).
- Deploy and manage data pipelines and databases following cloud-native cost-effective and scalable design principles.
- Collaborate with DevOps teams for CI/CD infrastructure-as-code data pipeline and database deployment/migration automation.
Governance Security & Compliance
- Implement data lineage versioning and stewardship processes.
- Ensure compliance with data privacy and security standards (GDPR HIPAA etc.).
- Partner with Data Governance teams to define data ownership data standards and stewardship workflows.
Requirements
Technical Skills Required
Core Skills
- Data Modelling: Expert-level in Relational (3NF) and Dimensional (Star/Snowflake) modelling; hands-on in Party data model multi-domain MDM and canonical models.
- Database: PostgreSQL (preferred) or any enterprise RDBMS.
- ER Modelling Tool Erwin/ERStudio Database Markup Language (DBML).
- ETL / Data Integration: Informatica Python PySpark SQL or similar tools.
- Cloud Platforms: AWS or Azure.
- Programming: Advanced SQL Python PySpark and/or UNIX/Linux scripting.
- Data Quality & Governance: Familiarity with data quality rules profiling match/merge and entity resolution.
- DevOps - Version Control & CI/CD: Git Azure DevOps Jenkins Terraform Redgate Flyway (preferred)
Database Design & Optimization (PostgreSQL)
- Design and maintain normalized and denormalized models using advanced features (schemas partitions views CTEs JSONB arrays).
- Build and optimize complex SQL queries materialized views and data marts for performance and scalability.
- Tune RDBMS (PostgreSQL) performance indexes query plans vacuum/analyze statistics parallelism and connection management.
- Leverage RDBMS (PostgreSQL) extensions such as:
- pgtrgm for fuzzy matching and probabilistic search.
- fuzzystrmatch pgvector for semantic similarity and name matching.
- hstore jsonb for flexible attribute storage.
- Implement RBAC row-level security partitioning and logical replication for scalable MDM deployment.
- Work with stored procedures functions and triggers for data quality checks and lineage automation.
- Implement HA/DR backup/restore database-level encryption (at-rest in-transit) column-level encryption for PII/PHI data.
Good to Have
- Knowledge of Master Data Management (MDM) - Customer Product etc.
- Experience with graph databases like Neo4j for relationship and lineage tracking.
- Knowledge of probabilistic and deterministic matching ML-based entity resolution or AI-driven data mastering.
- Experience in data cataloging data lineage tools or metadata management platforms.
- Familiarity with data security frameworks and Well-Architected Framework principles.
Soft Skills
- Strong analytical conceptual and problem-solving skills.
- Ability to collaborate in a cross-functional agile environment.
- Excellent communication and documentation skills.
- Self-driven proactive and capable of working with minimal supervision.
- Strong desire to innovate and build scalable reusable data frameworks.
Education
- Bachelors or masters degree in computer science Information Technology or related discipline.
- Certifications in AWS/Azure Informatica or Data Architecture are a plus.
Benefits
Why Join Us
- Be part of a cutting-edge MDM product initiative blending data architecture engineering AI/ML and cloud-native design.
- Opportunity to shape the next-generation data mastering framework for multiple industry domains.
- Gain deep exposure to data mastering lineage probabilistic search and graph-based relationship management.
- Competitive compensation flexible working and a technology-driven culture.
Required Skills:
Requirements Proficiency in Python programming. Advanced knowledge in mathematics and algorithm development. Experience in developing machine learning and deep learning models. Strong understanding of neural network architectures with emphasis on GenAI and LLMs. Skilled in data processing and visualization. Experienced in natural language processing. Knowledgeable in AI/ML deployment DevOps practices and cloud -depth understanding of AI security principles and practices.
View more
View less