Tech Lead-Data Engineer
Job Location:
Malvern, PA - USA
Monthly Salary:
Not Disclosed
Posted on:
7 days ago
Vacancies:
1 Vacancy
Job Summary
Role Name - Tech Lead-Data Engineer
Malvern PA Onsite
Skills: Java AWS Python PySpark Event-Driven Pipelines Data Architecture
Summary
We are seeking an experienced Tech lead- Data Engineer (15 years) with a strong background in Java AWS Python PySpark and event-driven architectures. You will design and build scalable batch and streaming data pipelines optimize cloud data platforms and deliver high-quality reliable datasets that support analytics reporting and machine learning workloads.
Key Responsibilities
Architect build and maintain event-driven data pipelines using AWS services such as Kinesis MSK/Kafka Lambda Step Functions SQS/SNS and Glue/EMR.
Develop ETL/ELT workflows using Python and PySpark ensuring performance scalability and cost efficiency.
Implement and optimize Spark-based data transformations partitioning strategies and data processing frameworks.
Design and manage data lake and warehouse structures using S3 Glue Catalog Athena and/or Redshift.
Build streaming solutions with checkpointing stateful transformations idempotency and schema evolution.
Ensure high standards of data quality observability monitoring and alerting (CloudWatch Datadog etc.).
Implement data security best practices including IAM encryption (KMS) networking and governance.
Create reusable frameworks internal libraries and CI/CD pipelines for automated deployments.
Collaborate with data scientists analysts and business teams to deliver well-modeled reliable datasets.
Lead design reviews mentor junior engineers and contribute to engineering best practices.
Required Qualifications
15 years of professional experience in Data Engineering.
Strong expertise in Python and PySpark for large-scale data processing.
Advanced hands-on experience with AWS (S3 Glue EMR Lambda Step Functions Kinesis/MSK DynamoDB Athena Redshift).
Deep experience building event-driven and streaming data pipelines.
Strong SQL experience for analytical and ETL workloads.
Hands-on experience with workflow orchestration tools such as Airflow or Step Functions.
Experience with CI/CD Git and Infrastructure-as-Code (Terraform or CloudFormation).
Strong understanding of distributed systems Spark performance tuning data modeling and cloud cost optimization.
Knowledge of data security encryption networking and compliance best practices in cloud environments.