Job Title: Data Engineer
Location: Princeton NJ USA
Mandatory Skills:
Key Skills & Technologies
Programming Languages: Python (primary) SQL
Cloud Platforms: AWS (S3 Glue Lambda Redshift EC2 EMR)
Data Tools: Apache Spark Pandas PySpark Airflow
Databases: PostgreSQL MySQL NoSQL (e.g. DynamoDB)
ETL & Workflow Orchestration: AWS Glue Apache Airflow
Version Control: Git
DevOps & CI/CD: Basic understanding of CI/CD pipelines and infrastructure as code (e.g. Terraform CloudFormation)
JD:
Data Pipeline Development - Design build and maintain scalable and reliable data pipelines to ingest process and transform data from various sources.
Data Integration & Management - Integrate structured and unstructured data from internal and external systems.
Ensure data quality consistency and availability across platforms.
Cloud-Based Data Engineering- Leverage AWS services (e.g. S3 Lambda Glue Redshift EMR) to build cloud-native data solutions.
Optimize cloud resources for performance and cost-efficiency.
Programming & Automation - Use Python for data manipulation ETL workflows and automation of data tasks.
Develop reusable scripts and modules for data processing.
Collaboration & Stakeholder Engagement
Work closely with data scientists analysts and business teams to understand data needs.
Translate business requirements into technical solutions.
Monitoring & Optimization - Monitor data pipelines and troubleshoot issues proactively.
Continuously improve performance scalability and reliability of data systems
Job Title: Data Engineer Location: Princeton NJ USA Mandatory Skills: Key Skills & Technologies Programming Languages: Python (primary) SQL Cloud Platforms: AWS (S3 Glue Lambda Redshift EC2 EMR) Data Tools: Apache Spark Pandas PySpark Airflow Databases: PostgreSQL MySQL NoSQL (...
Job Title: Data Engineer
Location: Princeton NJ USA
Mandatory Skills:
Key Skills & Technologies
Programming Languages: Python (primary) SQL
Cloud Platforms: AWS (S3 Glue Lambda Redshift EC2 EMR)
Data Tools: Apache Spark Pandas PySpark Airflow
Databases: PostgreSQL MySQL NoSQL (e.g. DynamoDB)
ETL & Workflow Orchestration: AWS Glue Apache Airflow
Version Control: Git
DevOps & CI/CD: Basic understanding of CI/CD pipelines and infrastructure as code (e.g. Terraform CloudFormation)
JD:
Data Pipeline Development - Design build and maintain scalable and reliable data pipelines to ingest process and transform data from various sources.
Data Integration & Management - Integrate structured and unstructured data from internal and external systems.
Ensure data quality consistency and availability across platforms.
Cloud-Based Data Engineering- Leverage AWS services (e.g. S3 Lambda Glue Redshift EMR) to build cloud-native data solutions.
Optimize cloud resources for performance and cost-efficiency.
Programming & Automation - Use Python for data manipulation ETL workflows and automation of data tasks.
Develop reusable scripts and modules for data processing.
Collaboration & Stakeholder Engagement
Work closely with data scientists analysts and business teams to understand data needs.
Translate business requirements into technical solutions.
Monitoring & Optimization - Monitor data pipelines and troubleshoot issues proactively.
Continuously improve performance scalability and reliability of data systems
View more
View less