Title: Data Engineer
Location: Alpharetta GA or Remote
Responsibilities:
- Design and implement robust production-grade pipelines using Python Spark SQL and Airflow to process high-volume file-based datasets (CSV Parquet JSON).
- Lead efforts to canonicalize raw healthcare data (837 claims EHR partner data flat files) into internal models.
- Own the full lifecycle of core pipelines - from file ingestion to validated queryable datasets - ensuring high reliability and performance.
- Onboard new customers by integrating their raw data into internal pipelines and canonical models; collaborate with SMEs Account Managers and Product to ensure successful implementation and troubleshooting.
- Build resilient idempotent transformation logic with data quality checks validation layers and observability.
- Refactor and scale existing pipelines to meet growing data and business needs.
- Tune Spark jobs and optimize distributed processing performance.
- Implement schema enforcement and versioning aligned with internal data standards.
- Collaborate deeply with Data Analysts Data Scientists Product Managers Engineering Platform SMEs and AMs to ensure pipelines meet evolving business needs.
- Monitor pipeline health participate in on-call rotations and proactively debug and resolve production data flow issues.
- Contribute to the evolution of our data platform - driving toward mature patterns in observability testing and automation.
- Build and enhance streaming pipelines (Kafka SQS or similar) where needed to support near-real-time data needs.
- Help develop and champion internal best practices around pipeline development and data Modeling.
- Skillset:
- 10 years of experience as a Data Engineer (or equivalent) building production-grade pipelines.
- Strong expertise in Python Spark SQL and Airflow.
- Experience processing large-scale file-based datasets (CSV Parquet JSON etc) in production environments.
- Experience mapping and standardizing raw external data into canonical models.
- Familiarity with AWS (or any cloud) including file storage and distributed compute concepts.
- Experience onboarding new customers and integrating external customer data with non-standard formats.
- Ability to work across teams manage priorities and own complex data workflows with minimal supervision.
- Strong written and verbal communication skills - able to explain technical concepts to non-engineering partners.
- Comfortable designing pipelines from scratch and improving existing pipelines.
- Experience working with large-scale or messy datasets (healthcare financial logs etc.).
- Experience building or willingness to learn streaming pipelines using tools such as Kafka or SQS.
- Bonus: Familiarity with healthcare data (837 835 EHR UB04 claims normalization).
- Bonus: Prior experience working on complex migration projects
Best Regards:
Bindu M
Phone:
Email:
Title: Data Engineer Location: Alpharetta GA or Remote Responsibilities: Design and implement robust production-grade pipelines using Python Spark SQL and Airflow to process high-volume file-based datasets (CSV Parquet JSON). Lead efforts to canonicalize raw healthcare data (837 claims EHR partne...
Title: Data Engineer
Location: Alpharetta GA or Remote
Responsibilities:
- Design and implement robust production-grade pipelines using Python Spark SQL and Airflow to process high-volume file-based datasets (CSV Parquet JSON).
- Lead efforts to canonicalize raw healthcare data (837 claims EHR partner data flat files) into internal models.
- Own the full lifecycle of core pipelines - from file ingestion to validated queryable datasets - ensuring high reliability and performance.
- Onboard new customers by integrating their raw data into internal pipelines and canonical models; collaborate with SMEs Account Managers and Product to ensure successful implementation and troubleshooting.
- Build resilient idempotent transformation logic with data quality checks validation layers and observability.
- Refactor and scale existing pipelines to meet growing data and business needs.
- Tune Spark jobs and optimize distributed processing performance.
- Implement schema enforcement and versioning aligned with internal data standards.
- Collaborate deeply with Data Analysts Data Scientists Product Managers Engineering Platform SMEs and AMs to ensure pipelines meet evolving business needs.
- Monitor pipeline health participate in on-call rotations and proactively debug and resolve production data flow issues.
- Contribute to the evolution of our data platform - driving toward mature patterns in observability testing and automation.
- Build and enhance streaming pipelines (Kafka SQS or similar) where needed to support near-real-time data needs.
- Help develop and champion internal best practices around pipeline development and data Modeling.
- Skillset:
- 10 years of experience as a Data Engineer (or equivalent) building production-grade pipelines.
- Strong expertise in Python Spark SQL and Airflow.
- Experience processing large-scale file-based datasets (CSV Parquet JSON etc) in production environments.
- Experience mapping and standardizing raw external data into canonical models.
- Familiarity with AWS (or any cloud) including file storage and distributed compute concepts.
- Experience onboarding new customers and integrating external customer data with non-standard formats.
- Ability to work across teams manage priorities and own complex data workflows with minimal supervision.
- Strong written and verbal communication skills - able to explain technical concepts to non-engineering partners.
- Comfortable designing pipelines from scratch and improving existing pipelines.
- Experience working with large-scale or messy datasets (healthcare financial logs etc.).
- Experience building or willingness to learn streaming pipelines using tools such as Kafka or SQS.
- Bonus: Familiarity with healthcare data (837 835 EHR UB04 claims normalization).
- Bonus: Prior experience working on complex migration projects
Best Regards:
Bindu M
Phone:
Email:
View more
View less