Key Responsibilities
- Data Pipeline Development: Design build and maintain scalable data pipelines to ingest process and transform structured and unstructured data.
- Data Modeling: Create optimized data models to support analytics reporting and machine learning workflows.
- ETL/ELT Processes: Develop and manage ETL/ELT workflows to ensure clean reliable and high-quality data.
- Database Management: Work with relational and NoSQL databases to ensure efficient storage and retrieval of large datasets.
- Cloud Data Solutions: Implement and optimize data solutions on cloud platforms like AWS Azure or GCP.
- Data Quality & Governance: Ensure data integrity security compliance and quality across systems.
- Collaboration: Partner with data scientists analysts and software engineers to deliver reliable data infrastructure.
- Automation: Streamline data processes using orchestration tools and automation frameworks.
- Monitoring & Optimization: Implement monitoring logging and performance tuning of data systems.
- Documentation: Maintain detailed documentation of data pipelines architecture and workflows.
Skills
- Programming Skills: Proficiency in Python SQL and familiarity with Java/Scala.
- Data Pipelines & ETL: Experience with ETL tools (Airflow DBT Informatica Talend).
- Big Data Frameworks: Knowledge of Spark Hadoop Kafka or Flink.
- Data Warehousing: Hands-on experience with Snowflake Redshift BigQuery or Synapse.
- Cloud Platforms: Proficiency in AWS (Glue Redshift S3) Azure (Data Factory Synapse) or GCP (BigQuery Dataflow).
- Databases: Strong experience with relational databases (PostgreSQL MySQL Oracle) and NoSQL databases (MongoDB Cassandra).
- Data Modeling: Expertise in designing star/snowflake schemas OLTP/OLAP systems.
- DevOps & Version Control: Familiarity with Git CI/CD pipelines and Infrastructure as Code (Terraform).
- Data Governance & Security: Knowledge of GDPR HIPAA encryption role-based access controls.
- Analytical Skills: Strong problem-solving and optimization skills in handling big data.
- Collaboration & Communication: Ability to work in cross-functional teams and clearly document technical processes.
Skill Matrix
| Skill Category | Skills | Proficiency Level (1-5) |
| Technical Skills | Data Engineering Data Modeling ETL Data Quality | 4-5 |
| Programming | Python SQL Java/Scala Git | 3-5 |
| Data Pipelines | Airflow DBT Talend Informatica | 3-4 |
| Big Data Frameworks | Spark Hadoop Kafka Flink | 3-4 |
| Data Warehousing | Snowflake Redshift BigQuery Synapse | 3-4 |
| Cloud Platforms | AWS (Glue S3 Redshift) Azure (ADF Synapse) GCP (BigQuery Dataflow) | 3-4 |
| Databases | PostgreSQL MySQL Oracle MongoDB Cassandra | 3-4 |
| Data Modeling | OLTP OLAP Star/Snowflake Schema | 4-5 |
| DevOps & Automation | CI/CD Terraform Docker Kubernetes | 2-3 |
| Data Governance & Security | Data Compliance Encryption Role-Based Access | 3-4 |
| Analytical Skills | Query Optimization Data Debugging Performance Tuning | 4-5 |
| Collaboration | Cross-Functional Teamwork Stakeholder Management | 3-4 |
| Documentation | Data Architecture Pipeline Workflows Standards | 3-4 |
| Research | Staying updated with emerging data engineering tools & trends | 3-4 |
Key Responsibilities Data Pipeline Development: Design build and maintain scalable data pipelines to ingest process and transform structured and unstructured data. Data Modeling: Create optimized data models to support analytics reporting and machine learning workflows. ETL/ELT Processes: Develo...
Key Responsibilities
- Data Pipeline Development: Design build and maintain scalable data pipelines to ingest process and transform structured and unstructured data.
- Data Modeling: Create optimized data models to support analytics reporting and machine learning workflows.
- ETL/ELT Processes: Develop and manage ETL/ELT workflows to ensure clean reliable and high-quality data.
- Database Management: Work with relational and NoSQL databases to ensure efficient storage and retrieval of large datasets.
- Cloud Data Solutions: Implement and optimize data solutions on cloud platforms like AWS Azure or GCP.
- Data Quality & Governance: Ensure data integrity security compliance and quality across systems.
- Collaboration: Partner with data scientists analysts and software engineers to deliver reliable data infrastructure.
- Automation: Streamline data processes using orchestration tools and automation frameworks.
- Monitoring & Optimization: Implement monitoring logging and performance tuning of data systems.
- Documentation: Maintain detailed documentation of data pipelines architecture and workflows.
Skills
- Programming Skills: Proficiency in Python SQL and familiarity with Java/Scala.
- Data Pipelines & ETL: Experience with ETL tools (Airflow DBT Informatica Talend).
- Big Data Frameworks: Knowledge of Spark Hadoop Kafka or Flink.
- Data Warehousing: Hands-on experience with Snowflake Redshift BigQuery or Synapse.
- Cloud Platforms: Proficiency in AWS (Glue Redshift S3) Azure (Data Factory Synapse) or GCP (BigQuery Dataflow).
- Databases: Strong experience with relational databases (PostgreSQL MySQL Oracle) and NoSQL databases (MongoDB Cassandra).
- Data Modeling: Expertise in designing star/snowflake schemas OLTP/OLAP systems.
- DevOps & Version Control: Familiarity with Git CI/CD pipelines and Infrastructure as Code (Terraform).
- Data Governance & Security: Knowledge of GDPR HIPAA encryption role-based access controls.
- Analytical Skills: Strong problem-solving and optimization skills in handling big data.
- Collaboration & Communication: Ability to work in cross-functional teams and clearly document technical processes.
Skill Matrix
| Skill Category | Skills | Proficiency Level (1-5) |
| Technical Skills | Data Engineering Data Modeling ETL Data Quality | 4-5 |
| Programming | Python SQL Java/Scala Git | 3-5 |
| Data Pipelines | Airflow DBT Talend Informatica | 3-4 |
| Big Data Frameworks | Spark Hadoop Kafka Flink | 3-4 |
| Data Warehousing | Snowflake Redshift BigQuery Synapse | 3-4 |
| Cloud Platforms | AWS (Glue S3 Redshift) Azure (ADF Synapse) GCP (BigQuery Dataflow) | 3-4 |
| Databases | PostgreSQL MySQL Oracle MongoDB Cassandra | 3-4 |
| Data Modeling | OLTP OLAP Star/Snowflake Schema | 4-5 |
| DevOps & Automation | CI/CD Terraform Docker Kubernetes | 2-3 |
| Data Governance & Security | Data Compliance Encryption Role-Based Access | 3-4 |
| Analytical Skills | Query Optimization Data Debugging Performance Tuning | 4-5 |
| Collaboration | Cross-Functional Teamwork Stakeholder Management | 3-4 |
| Documentation | Data Architecture Pipeline Workflows Standards | 3-4 |
| Research | Staying updated with emerging data engineering tools & trends | 3-4 |
View more
View less