Topic 1: Design of a Schema-Resilient Data Ingestion Architecture for MASS Analytics Using Apache NiFi and Apache Iceberg
Description:
MASS Analytics products ingest data from external client systems such as Snowflake and process it through analytics models and AOA workflows.
Frequent schema changes in source systems can break ingestion modeling and automation pipelines causing downtime and manual intervention.
This project aims to design and implement a robust data ingestion and storage architecture using Apache NiFi and Apache Iceberg to detect control and manage schema evolution.
Key attributes / Main competencies:
Java and Python programming
Relational databases and SQL
Data modeling and schema management
Data pipeline design and integration
Problem-solving and analytical skills
Software engineering principles
Learning Outcomes:
Understand challenges of schema evolution in large-scale analytics platforms
Design resilient data pipelines decoupled from source system changes
Implement controlled schema evolution using modern Lakehouse technologies
Evaluate pipeline stability and performance under schema variability
Topic 2: Design of an intelligent orchestration framework for MASS Analytics Always On Analytics workflows using MCP and Large Language Models
Description:
The project exposes AOA components as MCP tools and uses an LLM to dynamically plan execute and monitor end-to-end workflows. The solution handles failures conditional steps and component dependencies through policy-driven decision making. Built-in guardrails ensure secure explainable and auditable execution suitable for enterprise environments. framework the outcome is a more resilient adaptive and maintainable AOA pipeline orchestration
Key attributes / Main competencies:
Large Language Models and AI-assisted systems
Distributed systems and workflow orchestration
API-based system integration and MCP concepts
Software architecture and modular design
Learning Outcomes:
Understanding of LLM-based orchestration and decision-making systems
Ability to design and integrate distributed workflow components
Application of policy-driven control and guardrails in AI systems
Analysis and handling of failures in automated pipelines
Evaluation of system resilience explainability and maintainability
Topic 3: Design and implement an Always-On Analytics (AOA) application for the Databricks Marketplace that continuously runs cost-efficient analytics pipelines.
Description:
The project focuses on incremental data processing to refresh models automatically while minimizing compute usage.
It includes monitoring mechanisms to track data quality model stability and performance over time.
The application generates actionable insights and prioritized recommendations ready to drive the next dollar of value.
The solution is built as a scalable reusable and marketplace-ready Databricks app
Key attributes / Main competencies:
Incremental and batch data processing
Machine learning model lifecycle management
Data quality monitoring and validation
Performance analysis and system monitoring
Distributed computing with Databricks and Spark
Scalable application design
Learning Outcomes:
Understanding incremental data processing strategies to optimize compute usage
Ability to automate model refresh and evaluation pipelines
Application of data quality and model stability monitoring techniques
Design of scalable and reusable analytics applications
Generation of data-driven insights and business recommendations
We specialize in Marketing Mix Modeling (MMM) and Media Effectiveness Measurement. We offer our clients a comprehensive MMM software suite backed up by a wide range of managed services solutions to help identify sales drivers, measure MROI and optimize Marketing budgets.