Key Responsibilities
- Design develop and maintain scalable data pipelines using PySpark and Azure Data Factory.
- Work closely with business stakeholders analysts and data scientists to understand data requirements and deliver reliable solutions.
- Optimize ETL workflows for performance scalability and reliability.
- Implement best practices for data ingestion transformation and integration across multiple sources.
- Ensure data quality governance and security across the data lifecycle.
- Troubleshoot and resolve issues related to data pipelines storage and performance.
Required Skills & Qualifications
- 5 years of total experience with 3 years relevant in PySpark Azure Data Factory and Python.
- Strong experience in building large-scale data pipelines and ETL workflows.
- Hands-on expertise in PySpark for data processing and transformation.
- Proficiency in Azure Data Factory (ADF) for orchestrating and automating workflows.
- Solid understanding of Python for scripting data handling and automation.
- Strong SQL skills and ability to work with relational and non-relational databases.
- Good knowledge of data warehousing concepts and performance optimization.
- Exposure to Azure ecosystem (Data Lake Databricks Synapse Analytics etc.) preferred.
- Excellent problem-solving analytical and communication skills.
Nice to Have (Optional)
- Experience with CI/CD pipelines for data solutions.
- Knowledge of data governance security and compliance frameworks.
- Familiarity with real-time data streaming technologies (Kafka Event Hubs etc.).
Additional Details
- Cloud Preference: Azure only (AWS experience not required).
- Budget/CTC Range: 18 LPA.
- Contract/Full-Time: Full-Time
Key ResponsibilitiesDesign develop and maintain scalable data pipelines using PySpark and Azure Data Factory.Work closely with business stakeholders analysts and data scientists to understand data requirements and deliver reliable solutions.Optimize ETL workflows for performance scalability and rel...
Key Responsibilities
- Design develop and maintain scalable data pipelines using PySpark and Azure Data Factory.
- Work closely with business stakeholders analysts and data scientists to understand data requirements and deliver reliable solutions.
- Optimize ETL workflows for performance scalability and reliability.
- Implement best practices for data ingestion transformation and integration across multiple sources.
- Ensure data quality governance and security across the data lifecycle.
- Troubleshoot and resolve issues related to data pipelines storage and performance.
Required Skills & Qualifications
- 5 years of total experience with 3 years relevant in PySpark Azure Data Factory and Python.
- Strong experience in building large-scale data pipelines and ETL workflows.
- Hands-on expertise in PySpark for data processing and transformation.
- Proficiency in Azure Data Factory (ADF) for orchestrating and automating workflows.
- Solid understanding of Python for scripting data handling and automation.
- Strong SQL skills and ability to work with relational and non-relational databases.
- Good knowledge of data warehousing concepts and performance optimization.
- Exposure to Azure ecosystem (Data Lake Databricks Synapse Analytics etc.) preferred.
- Excellent problem-solving analytical and communication skills.
Nice to Have (Optional)
- Experience with CI/CD pipelines for data solutions.
- Knowledge of data governance security and compliance frameworks.
- Familiarity with real-time data streaming technologies (Kafka Event Hubs etc.).
Additional Details
- Cloud Preference: Azure only (AWS experience not required).
- Budget/CTC Range: 18 LPA.
- Contract/Full-Time: Full-Time
View more
View less