Senior Python Data Engineer AWS Focus - In Hyderabad - Permanent & Onsite Opportunity.
Description
We are seeking a highly skilled Python Data Engineer with deep expertise in AWS-based data solutions. This role is responsible for designing building and optimizing large-scale data pipelines and frameworks that power analytics and machine learning workloads. Youll lead the modernization of legacy systems by migrating workloads from platforms like Teradata to AWS-native big data environments such as EMR Glue and Redshift. Strong emphasis is placed on reusability automation observability and performance optimization.
Key Responsibilities
- Migration & Modernization: Build reusable accelerators and frameworks to migrate data from legacy platforms (e.g. Teradata) to AWS-native architectures such as EMR and Redshift.
- Data Pipeline Development: Design and implement robust ETL/ELT pipelines using Python PySpark and SQL on AWS big data platforms.
- Code Quality & Testing: Drive development standards with test-driven development unit testing and automated validation of data pipelines.
- Monitoring & Observability: Build operational tooling and dashboards for pipeline observability including metrics tracking (latency throughput data quality cost).
- Cloud-Native Engineering: Architect scalable secure data workflows using AWS services like Glue Lambda Step Functions S3 and Athena.
- Collaboration: Partner with internal product teams data scientists and external stakeholders to clarify requirements and drive solutions aligned with business goals.
- Architecture & Integration: Work with enterprise architects to evolve data architecture while integrating AWS systems with on-premise or hybrid environments securely.
- ML Support & Experimentation: Enable data scientists to operationalize machine learning models by providing clean well-governed datasets at scale.
- Documentation & Enablement: Document solutions thoroughly and provide technical guidance and knowledge sharing to internal engineering teams.
Qualifications
- Experience: 7 years in technology roles with at least 4 years in data engineering software development and distributed systems.
Programming:
- Expert in Python and PySpark (Scala is a plus)
- Deep understanding of software engineering best practices
AWS Expertise:
- 3 years of hands-on experience in AWS data ecosystem
- Proficient in AWS Glue S3 Redshift EMR Athena Step Functions Lambda
- Experience with AWS Lake Formation and data cataloging tools is a plus
- AWS Data Analytics or Solutions Architect certification is a strong plus
Big Data & MPP Systems:
- Strong grasp of distributed data processing
- Experience with MPP data warehouses like Redshift Snowflake or Databricks on AWS
DevOps & Tooling:
- Experience with version control (GitHub/CodeCommit) and CI/CD tools (CodePipeline Jenkins etc.)
- Familiarity with containerization and deployment in Kubernetes or ECS
Data Quality & Governance:
- Experience with data profiling data lineage and tools
- Understanding of metadata management and data security best practices
Bonus:
- Experience supporting machine learning or data science workflows
- Familiarity with BI tools such as QuickSight PowerBI or Tableau
Data Engineer with Python & AWS