Exp:4 to 6 years
Location: Pune Mumbai chennai Bangalore
Shift: days shift
5 days working
Job Responsibilities
- Design develop and optimize data processing pipelines using PySpark on AWS Databricks ensuring scalability and high performance.
- Implement data ingestion transformation and aggregation workflows to support advanced analytics and enterprise reporting needs.
- Collaborate with cross-functional teams to build cloud-native data solutions leveraging AWS services such as S3 and EC2.
- Write efficient Python code for automation orchestration and integration of data workflows across environments.
- Ensure data quality reliability and governance through the implementation of best practices including error handling auditing and logging.
- Participate in performance tuning of PySpark jobs cloud cost optimization and continuous improvement of the data platform.
Mandatory Skills:
- 4 to 8 years of professional and relevant experience in software industry.
- Strong hands-on expertise in PySpark for distributed data processing and transformation.
- Proven experience with Python programming including implementing reusable production-grade code.
- Practical knowledge of AWS Databricks for building and orchestrating large-scale data pipelines.
- Demonstrated experience in processing structured and unstructured data using Spark clusters and cloud data platforms.
- Ability to apply data engineering best practices including version control CI/CD for data pipelines and performance tuning.
Preferred Skills:
- Working knowledge of SQL for data querying analysis and troubleshooting.
- Experience using AWS S3 for object storage and EC2 for compute orchestration in cloud environments.
- Understanding of cloud-native data architectures and principles of data security and governance.
- Exposure to BFSI domain use cases and familiarity with handling sensitive financial data.
- Familiarity with CI/CD tools and cloud-native deployment practices.
- Familiarity with ETL scripting and data pipeline automation.
- Knowledge of big data ecosystems and distributed computing.
Qualifications :
BE in IT or equivalent
Additional Information :
Must have skills:
- 4 to 8 years of professional and relevant experience in software industry.
- Strong hands-on expertise in PySpark for distributed data processing and transformation.
- Proven experience with Python programming including implementing reusable production-grade code.
- Practical knowledge of AWS Databricks for building and orchestrating large-scale data pipelines.
- Demonstrated experience in processing structured and unstructured data using Spark clusters and cloud data platforms.
- Ability to apply data engineering best practices including version control CI/CD for data pipelines and performance tuning.
Remote Work :
No
Employment Type :
Full-time
Exp:4 to 6 yearsLocation: Pune Mumbai chennai BangaloreShift: days shift5 days working Job ResponsibilitiesDesign develop and optimize data processing pipelines using PySpark on AWS Databricks ensuring scalability and high performance.Implement data ingestion transformation and aggregation workflo...
Exp:4 to 6 years
Location: Pune Mumbai chennai Bangalore
Shift: days shift
5 days working
Job Responsibilities
- Design develop and optimize data processing pipelines using PySpark on AWS Databricks ensuring scalability and high performance.
- Implement data ingestion transformation and aggregation workflows to support advanced analytics and enterprise reporting needs.
- Collaborate with cross-functional teams to build cloud-native data solutions leveraging AWS services such as S3 and EC2.
- Write efficient Python code for automation orchestration and integration of data workflows across environments.
- Ensure data quality reliability and governance through the implementation of best practices including error handling auditing and logging.
- Participate in performance tuning of PySpark jobs cloud cost optimization and continuous improvement of the data platform.
Mandatory Skills:
- 4 to 8 years of professional and relevant experience in software industry.
- Strong hands-on expertise in PySpark for distributed data processing and transformation.
- Proven experience with Python programming including implementing reusable production-grade code.
- Practical knowledge of AWS Databricks for building and orchestrating large-scale data pipelines.
- Demonstrated experience in processing structured and unstructured data using Spark clusters and cloud data platforms.
- Ability to apply data engineering best practices including version control CI/CD for data pipelines and performance tuning.
Preferred Skills:
- Working knowledge of SQL for data querying analysis and troubleshooting.
- Experience using AWS S3 for object storage and EC2 for compute orchestration in cloud environments.
- Understanding of cloud-native data architectures and principles of data security and governance.
- Exposure to BFSI domain use cases and familiarity with handling sensitive financial data.
- Familiarity with CI/CD tools and cloud-native deployment practices.
- Familiarity with ETL scripting and data pipeline automation.
- Knowledge of big data ecosystems and distributed computing.
Qualifications :
BE in IT or equivalent
Additional Information :
Must have skills:
- 4 to 8 years of professional and relevant experience in software industry.
- Strong hands-on expertise in PySpark for distributed data processing and transformation.
- Proven experience with Python programming including implementing reusable production-grade code.
- Practical knowledge of AWS Databricks for building and orchestrating large-scale data pipelines.
- Demonstrated experience in processing structured and unstructured data using Spark clusters and cloud data platforms.
- Ability to apply data engineering best practices including version control CI/CD for data pipelines and performance tuning.
Remote Work :
No
Employment Type :
Full-time
View more
View less