Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailLocation: Hyderabad Mode: Hybrid Experience: 5 years Role: Individual Contributor
We are hiring a Senior Data Engineer to support a critical data modernization initiative for a US-based global bank. The role focuses on migrating ETL workloads from legacy platforms (Ab Initio) to Apache Spark on Google Cloud. This is an IC-level position requiring strong hands-on skills in Spark development schema mapping and client delivery in a fast-paced agile setting.
Skill
Skill Depth
Apache Spark (via Python)
Must have developed Spark-based pipelines using Python including transformations (joins aggregations filters) schema evolution and partitioning. Should be capable of debugging Spark jobs and optimizing logic at the code level.
Python (ETL scripting)
Must have written ETL scripts for file ingestion (CSV JSON Parquet) transformation routines and validations. Scripts should follow modular design principles and include error handling and logging.
GCP – BigQuery GCS
Must have used BigQuery for structured queries and GCS for input/output file staging. Should understand dataset partitioning IAM roles and cost-aware design for GCP data services.
Schema Mapping & Validation
Should have contributed to schema-level field mapping transformation logic definition and validation of output data parity post-migration.
Client-Facing Delivery
Must have participated in requirement workshops solution walkthroughs and defect resolution. Should be capable of independently handling delivery documentation and client coordination.
Skill
Skill Depth
Ab Initio (read-only exposure)
Preferred if the candidate has reviewed Ab Initio graphs or mapping sheets to support migration logic recreation in Spark. Hands-on Ab Initio work is not required.
Airflow / Cloud Composer
Helpful if familiar with DAG creation and job orchestration using Airflow or GCP’s Composer. Should understand task dependencies and scheduling patterns.
GCP Dataflow / PubSub
Useful for teams dealing with real-time ingestion. Familiarity with Dataflow architecture and PubSub concepts is preferred but not essential.
Logging and Monitoring
Should have exposure to pipeline monitoring via structured logging log analyzers or GCP-native logging frameworks.
CI/CD for Data Pipelines
Awareness of deploying data jobs via Jenkins GitHub Actions or Cloud Build is a plus especially for projects involving frequent iteration.
Full Time