Data Engineer

Not Interested
Bookmark
Report This Job

profile Job Location:

Norwell, MA - USA

profile Monthly Salary: Not Disclosed
Posted on: 1 hour ago
Vacancies: 1 Vacancy

Job Summary

Position: Data Engineer

Location: Norwell MA ***Day 1 Onsite***

Duration: 1 Years

Python
PySpark
CI/CD pipelines
ARM templates
Great Expectations

Can you explain your experience with building data pipelines using PySpark
How do you ensure data consistency and scalability across datasets
Have you worked with ARM templates for automated infrastructure provisioning
How do you integrate data quality checks into CI/CD workflows
Can you provide an example of a data quality rule you have defined and automated

Job Summary:
Key Responsibilities:
Design and implement Silver and Gold layer data models following medallion architecture best practices.
Perform data cleansing standardization enrichment and aggregation to support analytics and reporting.
Build optimized PySpark-based transformations for large-scale data processing.
Ensure data consistency performance and scalability across datasets.
Build and maintain CI/CD pipelines using Git-based workflows (Azure DevOps / GitHub).
Use ARM templates (or IaC equivalents) for automated infrastructure provisioning.
Enable automated deployment of data pipelines notebooks and configurations.
Follow DevOps best practices for version control branching and release management.
Create modular maintainable and testable Python code.
Support automation of metadata logging alerting and operational tasks.
Implement data quality libraries such as Great Expectations.
Define and automate data quality rules (completeness accuracy freshness consistency).
Monitor log and troubleshoot data quality issues proactively.
Work closely with data architects analysts QA and business stakeholders.
Translate business and analytical requirements into robust data engineering solutions.
Participate in Agile ceremonies and support sprint-based delivery.
Required Skills and Qualifications:
Strong hands-on experience with Silver & Gold layer development Python (automation and data processing) and PySpark. Also with Great Expectations (data quality framework) and Experience with CI/CD pipelines using Git-based tools. Hands-on experience with ARM templates or infrastructure-as-code concepts. Strong understanding of data modeling and medallion architecture. Experience working with large datasets in distributed environments.
Good to Have:
Microsoft Certified: Azure Data Engineer Associate or Azure Enterprise Data Analyst Associate.
Wast management or oil and gas domain knowledge

Position: Data Engineer Location: Norwell MA ***Day 1 Onsite*** Duration: 1 Years Python PySpark CI/CD pipelines ARM templates Great Expectations Can you explain your experience with building data pipelines using PySpark How do you ensure data consistency and scala...
View more view more

Key Skills

  • Apache Hive
  • S3
  • Hadoop
  • Redshift
  • Spark
  • AWS
  • Apache Pig
  • NoSQL
  • Big Data
  • Data Warehouse
  • Kafka
  • Scala