Experience: 7 Years
Location: Pan India
Work Mode: Hybrid
Shift: General
Role Overview We are looking for a Senior Python Data Engineer with strong hands-on expertise in Python Pandas PySpark and Azure-based data platforms. The role involves designing scalable ETL pipelines working with structured and unstructured data and building modern data solutions including vector databases to support advanced analytics and RAG-style workloads.
Primary Responsibilities -
Design build and maintain robust scalable ETL pipelines for batch and large-scale data processing
-
Develop advanced Python-based data engineering solutions using Pandas and PySpark
-
Implement data ingestion and processing workflows using Azure Synapse and related Azure services
-
Apply data modeling and data quality best practices including schema design validation and testing
-
Work with SQL and file-based data sources (SQL Server PostgreSQL CSV JSON Parquet)
-
Automate manual data ingestion and processing tasks through scripts jobs and orchestration-ready workflows
-
Collaborate with cross-functional teams and clearly communicate progress risks and blockers
-
Follow best practices for version control using Git and GitHub (branching PRs code reviews)
Mandatory Skills -
Strong proficiency in Python for data engineering
-
Hands-on experience with Pandas and PySpark
-
Solid experience with Azure Cloud Services especially Azure Synapse
-
Strong understanding of ETL processes data modeling and data quality standards
-
Experience working with structured and unstructured data
-
Familiarity with MongoDB Qdrant and vector database concepts (embeddings similarity search RAG patterns)
-
Working knowledge of Git and GitHub
Desired / Good-to-Have Skills -
Orchestration tools: Conductor Airflow Prefect Azure Data Factory
-
Azure DevOps (ADO) boards and Scrum workflow discipline
-
Local development optimization: Docker reproducible dev environments Makefiles
-
Performance tuning in Spark / Synapse (partitioning caching join strategies)
-
Vector search best practices: HNSW indexing hybrid search payload design filtering
-
Observability for data pipelines: logging metrics alerting
-
Event-driven and scheduled ingestion automation patterns
-
Strong collaboration mindset and proactive ownership of shared initiatives
Experience: 7 Years Location: Pan India Work Mode: Hybrid Shift: General Role Overview We are looking for a Senior Python Data Engineer with strong hands-on expertise in Python Pandas PySpark and Azure-based data platforms. The role involves designing scalable ETL pipelines working with structu...
Experience: 7 Years
Location: Pan India
Work Mode: Hybrid
Shift: General
Role Overview We are looking for a Senior Python Data Engineer with strong hands-on expertise in Python Pandas PySpark and Azure-based data platforms. The role involves designing scalable ETL pipelines working with structured and unstructured data and building modern data solutions including vector databases to support advanced analytics and RAG-style workloads.
Primary Responsibilities -
Design build and maintain robust scalable ETL pipelines for batch and large-scale data processing
-
Develop advanced Python-based data engineering solutions using Pandas and PySpark
-
Implement data ingestion and processing workflows using Azure Synapse and related Azure services
-
Apply data modeling and data quality best practices including schema design validation and testing
-
Work with SQL and file-based data sources (SQL Server PostgreSQL CSV JSON Parquet)
-
Automate manual data ingestion and processing tasks through scripts jobs and orchestration-ready workflows
-
Collaborate with cross-functional teams and clearly communicate progress risks and blockers
-
Follow best practices for version control using Git and GitHub (branching PRs code reviews)
Mandatory Skills -
Strong proficiency in Python for data engineering
-
Hands-on experience with Pandas and PySpark
-
Solid experience with Azure Cloud Services especially Azure Synapse
-
Strong understanding of ETL processes data modeling and data quality standards
-
Experience working with structured and unstructured data
-
Familiarity with MongoDB Qdrant and vector database concepts (embeddings similarity search RAG patterns)
-
Working knowledge of Git and GitHub
Desired / Good-to-Have Skills -
Orchestration tools: Conductor Airflow Prefect Azure Data Factory
-
Azure DevOps (ADO) boards and Scrum workflow discipline
-
Local development optimization: Docker reproducible dev environments Makefiles
-
Performance tuning in Spark / Synapse (partitioning caching join strategies)
-
Vector search best practices: HNSW indexing hybrid search payload design filtering
-
Observability for data pipelines: logging metrics alerting
-
Event-driven and scheduled ingestion automation patterns
-
Strong collaboration mindset and proactive ownership of shared initiatives
View more
View less