drjobs Data Engineer

Data Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Lisbon - Portugal

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

The Data Engineer is responsible for building and maintaining the infrastructure that supports the organizations data architecture. The role involves creating and managing data pipelines using Airflow for data extraction processing and loading ensuring their maintenance monitoring and stability.

 

The engineer will work closely with data analysts and end-users to provide accessible and reliable data.              

                                       

What we expect from the candidate

 

  • Candidate must be able to use Unix must know how to use Unix commands to check processes to read files processes and run bash commands. Candidate needs to know how to access a Unix server and perform commands there. If some process is not running needs to check the server to see what might be going on. For example if a Hadoop/yarn process is not running of if some container for Airflow is not up need to know how to investigate.
  • Candidate must know how to list Docker containers how to build Docker images how to change current images to add or remove things how to use and map volumes. Must know how to maintain and setup a distributed Airflow environment using Docker need to know how to build custom Docker images using Airflow image as base.
  • We strongly expect that the candidate knows Airflow knows Airflow components knows how to identify possible issues in the servers and fix them knows how to add more workers to the cluster. Need to make sure the containers are running fine in the servers and if any issue need to be able to fix.
  • Candidate must know how to maintain a Hadoop/Yarn cluster with Spark. Need to know which processes need to run in the servers how to set up the xml files for Hadoop and Yarn how to perform commands in HDFS. Need to be able to add a new worker in the Hadoop Cluster if necessary fix any possible issues in the servers. Need to know how to read the logs from Yarn and HDFS. Must know and understand how Spark works using Yarn as the resource manager.
  • Candidate must know how to develop in Python how to manage packages with pip review PRs from other people in the team and how to maintain and use a Flask API.
  • Candidate must know SQL how to run queries with CTEs window functions mainly Oracle database.

 

Main Tasks:

  • Responsible for maintaining the infrastructure that supports the current data architecture
  • Responsible for creating data pipelines in Airflow for data extracting processing and loading
  • Responsible for data pipelines maintenance monitoring and stability                                        
  • Responsible for providing data access to data analysts and end-users
  • Responsible for DevOps infrastructure
  • Responsible for deploying Airflow dags to production environment using DevOps tools                                    
  • Responsible for code and query optimization
  • Responsible for data pipelines maintenance monitoring and stability
  • Responsible for code review                                           
  • Responsible for improving the current data architecture and DevOps processes
  • Responsible for delivering data in useful and appealing ways to users                                             
  • Responsible for performing and documenting analysis review and study on specified regulatory topics
  • Responsible for understanding business change and requirement needs assess the impact and the cost.                                          

Qualifications :

Technical Skills:

 

  • Python  
  • Experience in creating APIs in Python 
  • PySpark   
  • Spark Environment Architecture 
  • SQL Oracle Data Base                                                      
  • Experience in creating and maintaining distributed environments using Hadoop and Spark   
  • Hadoop ecosystem - HDFS Yarn  
  • Containerization - Docker is Mandatory  
  • Data Lakes - Experience in organizing and maintaining data lakes - S3 is preferred  
  • Experience with Parquet file format                               
  • Apache Airflow - Experience in both pipeline development and deploying Airflow in distributed environment                              
  • Apache Kakfa     
  • Experience in automating applications deployment using DevOps tools - Jenkins is Mandatory Ansible is a plus                                       

 

Language Skills

  • English                                                                                                                                      


Remote Work :

No


Employment Type :

Full-time

Employment Type

Full-time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.