Position: Data Engineer
Location: Princeton NJ***Day 1 Onsite***
Duration: 1 Years
| | |
| Mandatory Skills | Key Skills & Technologies Programming Languages: Python (primary) SQL Cloud Platforms: AWS (S3 Glue Lambda Redshift EC2 EMR) Data Tools: Apache Spark Pandas PySpark Airflow Databases: PostgreSQL MySQL NoSQL (e.g. DynamoDB) ETL & Workflow Orchestration: AWS Glue Apache Airflow Version Control: Git DevOps & CI/CD: Basic understanding of CI/CD pipelines and infrastructure as code (e.g. Terraform CloudFormation) |
| JD | Data Pipeline Development - Design build and maintain scalable and reliable data pipelines to ingest process and transform data from various sources. Data Integration & Management - Integrate structured and unstructured data from internal and external systems. Ensure data quality consistency and availability across platforms. Cloud-Based Data Engineering- Leverage AWS services (e.g. S3 Lambda Glue Redshift EMR) to build cloud-native data solutions. Optimize cloud resources for performance and cost-efficiency. Programming & Automation - Use Python for data manipulation ETL workflows and automation of data tasks. Develop reusable scripts and modules for data processing. Collaboration & Stakeholder Engagement Work closely with data scientists analysts and business teams to understand data needs. Translate business requirements into technical solutions. Monitoring & Optimization - Monitor data pipelines and troubleshoot issues proactively. Continuously improve performance scalability and reliability of data systems. |
Position: Data Engineer Location: Princeton NJ***Day 1 Onsite*** Duration: 1 Years Mandatory Skills Key Skills & Technologies Programming Languages: Python (primary) SQL Cloud Platforms: AWS (S3 Glue Lambda Redshift EC2 EMR) Data Tools: Apache Spark Pandas PySpark Airfl...
Position: Data Engineer
Location: Princeton NJ***Day 1 Onsite***
Duration: 1 Years
| | |
| Mandatory Skills | Key Skills & Technologies Programming Languages: Python (primary) SQL Cloud Platforms: AWS (S3 Glue Lambda Redshift EC2 EMR) Data Tools: Apache Spark Pandas PySpark Airflow Databases: PostgreSQL MySQL NoSQL (e.g. DynamoDB) ETL & Workflow Orchestration: AWS Glue Apache Airflow Version Control: Git DevOps & CI/CD: Basic understanding of CI/CD pipelines and infrastructure as code (e.g. Terraform CloudFormation) |
| JD | Data Pipeline Development - Design build and maintain scalable and reliable data pipelines to ingest process and transform data from various sources. Data Integration & Management - Integrate structured and unstructured data from internal and external systems. Ensure data quality consistency and availability across platforms. Cloud-Based Data Engineering- Leverage AWS services (e.g. S3 Lambda Glue Redshift EMR) to build cloud-native data solutions. Optimize cloud resources for performance and cost-efficiency. Programming & Automation - Use Python for data manipulation ETL workflows and automation of data tasks. Develop reusable scripts and modules for data processing. Collaboration & Stakeholder Engagement Work closely with data scientists analysts and business teams to understand data needs. Translate business requirements into technical solutions. Monitoring & Optimization - Monitor data pipelines and troubleshoot issues proactively. Continuously improve performance scalability and reliability of data systems. |
View more
View less