Data Engineer

TechniPros

Not Interested
Bookmark
Report This Job

profile Job Location:

Alpharetta, GA - USA

profile Monthly Salary: Not Disclosed
Posted on: 2 hours ago
Vacancies: 1 Vacancy

Job Summary

Title: Data Engineer
Location: Alpharetta GA or Remote

Responsibilities:

  • Design and implement robust production-grade pipelines using Python Spark SQL and Airflow to process high-volume file-based datasets (CSV Parquet JSON).
  • Lead efforts to canonicalize raw healthcare data (837 claims EHR partner data flat files) into internal models.
  • Own the full lifecycle of core pipelines - from file ingestion to validated queryable datasets - ensuring high reliability and performance.
  • Onboard new customers by integrating their raw data into internal pipelines and canonical models; collaborate with SMEs Account Managers and Product to ensure successful implementation and troubleshooting.
  • Build resilient idempotent transformation logic with data quality checks validation layers and observability.
  • Refactor and scale existing pipelines to meet growing data and business needs.
  • Tune Spark jobs and optimize distributed processing performance.
  • Implement schema enforcement and versioning aligned with internal data standards.
  • Collaborate deeply with Data Analysts Data Scientists Product Managers Engineering Platform SMEs and AMs to ensure pipelines meet evolving business needs.
  • Monitor pipeline health participate in on-call rotations and proactively debug and resolve production data flow issues.
  • Contribute to the evolution of our data platform - driving toward mature patterns in observability testing and automation.
  • Build and enhance streaming pipelines (Kafka SQS or similar) where needed to support near-real-time data needs.
  • Help develop and champion internal best practices around pipeline development and data Modeling.
  • Skillset:
  • 10 years of experience as a Data Engineer (or equivalent) building production-grade pipelines.
  • Strong expertise in Python Spark SQL and Airflow.
  • Experience processing large-scale file-based datasets (CSV Parquet JSON etc) in production environments.
  • Experience mapping and standardizing raw external data into canonical models.
  • Familiarity with AWS (or any cloud) including file storage and distributed compute concepts.
  • Experience onboarding new customers and integrating external customer data with non-standard formats.
  • Ability to work across teams manage priorities and own complex data workflows with minimal supervision.
  • Strong written and verbal communication skills - able to explain technical concepts to non-engineering partners.
  • Comfortable designing pipelines from scratch and improving existing pipelines.
  • Experience working with large-scale or messy datasets (healthcare financial logs etc.).
  • Experience building or willingness to learn streaming pipelines using tools such as Kafka or SQS.
  • Bonus: Familiarity with healthcare data (837 835 EHR UB04 claims normalization).
  • Bonus: Prior experience working on complex migration projects

Best Regards:

Bindu M
Phone:
Email:

Title: Data Engineer Location: Alpharetta GA or Remote Responsibilities: Design and implement robust production-grade pipelines using Python Spark SQL and Airflow to process high-volume file-based datasets (CSV Parquet JSON). Lead efforts to canonicalize raw healthcare data (837 claims EHR partne...
View more view more

Key Skills

  • Apache Hive
  • S3
  • Hadoop
  • Redshift
  • Spark
  • AWS
  • Apache Pig
  • NoSQL
  • Big Data
  • Data Warehouse
  • Kafka
  • Scala