Responsibilities
Integrate data from multiple sources such as databases APIs or streaming platforms to provide a unified view of the data
Implement data quality checks and validation processes to ensure the accuracy completeness and consistency of data
Identify and resolve data quality issues monitor data pipelines for errors and implement data governance and data quality frameworks
Enforce data security and compliance with relevant regulations and industry-specific standards
Implement data access controls encryption mechanisms and monitor data privacy and security risks
Optimise data processing and query performance by tuning database configurations implementing indexing strategies and leveraging distributed computing frameworks
Optimize data structures for efficient querying and develop data dictionaries and metadata repositories
Identify and resolve performance bottlenecks in data pipelines and systems
Collaborate with cross-functional teams including data scientists analysts and business stakeholders
Document data pipelines data schemas and system configurations making it easier for others to understand and work with the data infrastructure
Monitor data pipelines databases and data infrastructure for errors performance issues and system failures
Set up monitoring tools alerts and logging mechanisms to proactively identify and resolve issues to ensure the availability and reliability of data
It would be a plus if he has software engineering background
Requirements
Bachelors or masters degree in computer science information technology data engineering or a related field
Strong knowledge of databases data structures algorithms
Proficiency in working with data engineering tools and technologies including knowledge of data integration tools (e.g. Apache Kafka Azure IoTHub Azure EventHub) ETL/ELT frameworks (e.g. Apache Spark Azure Synapse) big data platforms (e.g. Apache Hadoop) and cloud platforms (e.g. Amazon Web Services Google Cloud Platform Microsoft Azure)
Expertise in working with relational databases (e.g. MySQL PostgreSQL Azure SQL Azure Data Explorer) and data warehousing concepts.
Familiarity with data modeling schema design indexing and optimization techniques is valuable for building efficient and scalable data systems
Proficiency in languages such as Python SQL KQL Java and Scala
Experience with scripting languages like Bash or PowerShell for automation and system administration tasks
Strong knowledge of data processing frameworks like Apache Spark Apache Flink or Apache Beam for efficiently handling large-scale data processing and transformation tasks
Understanding of data serialization formats (e.g. JSON Avro Parquet) and data serialization libraries (e.g. Apache Avro Apache Parquet) is valuable
Having experience in CI/CD and GitHub that demonstrates ability to work in a collaborative and iterative development environment
Having experience in visualization tools (e.g. Power BI Plotly Grafana Redash) is beneficial
Preferred Skills & Characteristics
Consistently display dynamic independent work habits goal oriented passionate in growth mindsets possess a can do attitude and self-motivated professional. Self-driven and proactive in keeping up with new technologies and programming