DescriptionTitle: Technical Specialist
Location: Mumbai
Education: Bachelors Degree
Job Description:
- Build scalable ETL/ELT pipelines using Databricks (PySpark SQL Spark Streaming).
- Develop and optimize Delta Lake tables ACID transactions schema evolution and time travel.
- Implement Unity Catalog data governance and access cluster configurations job workflows and performance tuning in Databricks.
- Design and implement batch and streaming pipelines using Spark Structured Streaming.
- Integrate Databricks with multiple data sources (RDBMS APIs cloud storage message queues).Develop reusable modular and automated data processing frameworks.
- Implement CI/CD pipelines for Databricks using GitHub Actions / Azure DevOps / cluster management and job orchestration using Databricks REST APIs.
- Maintain code quality unit tests and documentation.
- Write and optimize complex SQL queries and statements to ensure high performance and efficient data retrieval.
- Strong database design including normalization data modelling and relational schema creation.
- Conduct performance analysis troubleshoot database issues like slow queries or deadlocks and implement solutions
- Design and implement database structures including tables schemas views stored procedures functions and triggers.
- Optimize database performance through query tuning indexing and performance analysis.
- Ensure data integrity security and compliance standards
- Need strong Python skills combined with expertise in Apache Spark for large scale data processing. Core abilities include building efficient ETL pipelines optimizing distributed jobs and handling large-scale data transformations
- Expertise in Python programming Spark APIs and parallel processing.
- Proficiency in Python (including Pandas NumPy) for data manipulation and scripting
- Deep knowledge of PySpark APIs like DataFrames RDDs Spark SQL for querying and processing.
- Familiarity with RESTful APIs batch processing CI/CD and monitoring data jobs.
- Optimize Spark jobs for performance troubleshoot issues and ensure data quality across systems.
- Collaborate with data engineers and scientists to implement workflows conduct code reviews and integrate with cloud platforms like AWS or Azure.
- Design develop and maintain scalable data pipelines and ETL processes using Azure Databricks
- Build data transformation workflows using Python or Scala.
- Work with data lakes using Delta Lake.
- Integrate data from multiple sources such as APIs databases and cloud storage.
- Monitor and optimize data workflows for performance and reliability.
- Collaborate with data scientists analysts and business teams
Required Experience:
IC
DescriptionTitle: Technical SpecialistLocation: MumbaiEducation: Bachelors DegreeJob Description:Build scalable ETL/ELT pipelines using Databricks (PySpark SQL Spark Streaming).Develop and optimize Delta Lake tables ACID transactions schema evolution and time travel.Implement Unity Catalog data gove...
DescriptionTitle: Technical Specialist
Location: Mumbai
Education: Bachelors Degree
Job Description:
- Build scalable ETL/ELT pipelines using Databricks (PySpark SQL Spark Streaming).
- Develop and optimize Delta Lake tables ACID transactions schema evolution and time travel.
- Implement Unity Catalog data governance and access cluster configurations job workflows and performance tuning in Databricks.
- Design and implement batch and streaming pipelines using Spark Structured Streaming.
- Integrate Databricks with multiple data sources (RDBMS APIs cloud storage message queues).Develop reusable modular and automated data processing frameworks.
- Implement CI/CD pipelines for Databricks using GitHub Actions / Azure DevOps / cluster management and job orchestration using Databricks REST APIs.
- Maintain code quality unit tests and documentation.
- Write and optimize complex SQL queries and statements to ensure high performance and efficient data retrieval.
- Strong database design including normalization data modelling and relational schema creation.
- Conduct performance analysis troubleshoot database issues like slow queries or deadlocks and implement solutions
- Design and implement database structures including tables schemas views stored procedures functions and triggers.
- Optimize database performance through query tuning indexing and performance analysis.
- Ensure data integrity security and compliance standards
- Need strong Python skills combined with expertise in Apache Spark for large scale data processing. Core abilities include building efficient ETL pipelines optimizing distributed jobs and handling large-scale data transformations
- Expertise in Python programming Spark APIs and parallel processing.
- Proficiency in Python (including Pandas NumPy) for data manipulation and scripting
- Deep knowledge of PySpark APIs like DataFrames RDDs Spark SQL for querying and processing.
- Familiarity with RESTful APIs batch processing CI/CD and monitoring data jobs.
- Optimize Spark jobs for performance troubleshoot issues and ensure data quality across systems.
- Collaborate with data engineers and scientists to implement workflows conduct code reviews and integrate with cloud platforms like AWS or Azure.
- Design develop and maintain scalable data pipelines and ETL processes using Azure Databricks
- Build data transformation workflows using Python or Scala.
- Work with data lakes using Delta Lake.
- Integrate data from multiple sources such as APIs databases and cloud storage.
- Monitor and optimize data workflows for performance and reliability.
- Collaborate with data scientists analysts and business teams
Required Experience:
IC
View more
View less