AWS + Databricks Data EngineerLead Deloitte

Not Interested
Bookmark
Report This Job

profile Job Location:

Bengaluru - India

profile Monthly Salary: Not Disclosed
Posted on: 16 hours ago
Vacancies: 1 Vacancy

Job Summary

Experience Band: 6 10 years

Location: Bangalore.

Band: Consultant and Sr. Consultant.

Mandatory Skills: AWS Glue Lamda PySpark PYSQL and Databricks.

Good to have Skills: Redshift Airflow DLT Databricks administration

This position is responsible for developing the process of ETL / ELT / File Movement of data & integrations. The key responsibilities will be to process and move data between different compute and storage services as well as on-premises data sources at specified intervals. The employee will also be responsible for the creation scheduling orchestration and management of data pipelines.

Data engineers are responsible for ensuring the availability and quality of data needed for analysis and business transactions. This includes data integration acquisition cleansing harmonization and transforming raw data into curated datasets for data science data discovery and BI/analytics. Responsible for developing constructing testing and maintaining data sets and scalable data processing systems.

Data engineers work closest with Data Architects and Data Scientists. They also work with business and IT groups beyond the data sphere understanding the enterprise infrastructure and the many source systems.

Input is raw datasets. Output is analytics-ready integrated/curated datasets.

Responsibilities

  • Development experience as a data engineer with focus on core tools and technologies like AWS Lambda Glue S3 Pyspark SQL and Databricks. Experience in Redshift Athena and Airflow is added advantage.
  • Design develop and optimize ETL/ELT data pipelines using Glue / Lamda with Databricks.
  • Strong SQL and Python/PySpark skills for data transformation and analysis.
  • Work with structured semi-structured and unstructured data sources.
  • Troubleshoot and optimize data workflows for scalability and performance.
  • Strong experience with optimized performance-oriented ELT/ETL design and implementations in context of large and complex datasets using PySpark SQL Databricks Glue or Lamda
  • Design and build high-performance and scalable data pipelines adhering to delta/ medallion/ data-lake house data warehouse & data marts standards for optimal storage retrieval and processing of data.
  • Databricks: Hands-on experience with Databricks for data processing analytics and Serverless SQLWH Unity Catalogue optimization of code and cluster.
  • Develop data profiling and data quality methodologies and embed them into the processes involved in transforming data across the systems.
  • Experience in Agile Development and code deployment using Github & CI-CD pipelines.
  • Ability to work with business owners to define key business requirements and convert to technical specifications
  • Experience with security models and development on large data sets
  • Responsible for system testing ensuring effective resolution of defects timely discussion around business issues and appropriate management of resources relevant to data and integration

Secondary skills:

  • Project management status reports stand up knowledge on Agile methodology
  • Stake holder management
  • Mentor data engineers and analysts
  • Provide technical direction and design reviews
  • A determined focus on the user and user experience when problem- solving
  • Team player and able to collaborate proactively communicates and shares understanding and experience within a team
  • Clear communicator and able to work in a multicultural/lingual environment
  • Uses initiative and owns their deliverables end-to-end
  • Brings ideas to the table is inquisitive and excited by new technology
  • Used to working in a flexible independent manner on a mixture of small unstructured and large structured items
  • Provides regular considered feedback and regularly strives to improve their own ways of working

Preferred Qualifications / Certifications

  • Bachelors degree in computer science information technology management information systems or equivalent work experience
  • Experience working in regulated environments and with internal systems quality policies and procedures.
  • Experience in development and deployment on cloud infrastructure.
  • Pharmaceutical or healthcare industry experience
AWS and Databricks certification is good to have.
Experience Band: 6 10 years Location: Bangalore. Band: Consultant and Sr. Consultant. Mandatory Skills: AWS Glue Lamda PySpark PYSQL and Databricks. Good to have Skills: Redshift Airflow DLT Databricks administration This position is responsible for developing the process of ETL / ELT / Fil...
View more view more

Key Skills

  • Apache Hive
  • S3
  • Hadoop
  • Redshift
  • Spark
  • AWS
  • Apache Pig
  • NoSQL
  • Big Data
  • Data Warehouse
  • Kafka
  • Scala