Role Overview
We are seeking an experienced Lead Data Engineer to join our Data Engineering team at Paytm Indias leading digital payments and financial services platform. This is a critical role responsible for designing building and maintaining large-scale real-time data streams that process billions of transactions and user interactions daily. Data accuracy and stream reliability are essential to our operations as data quality issues can result in financial losses and impact customer trust.
This role requires expertise in designing fault-tolerant scalable data architectures that maintain high uptime standards while processing peak transaction loads during festivals and high-traffic events. We place the highest priority on data quality and system reliability as our customers depend on accurate timely information for their financial decisions. Youll collaborate with cross-functional teams including data scientists product managers and risk engineers to deliver data solutions that enable real-time fraud detection personalized recommendations credit scoring and regulatory compliance reporting.
Key technical challenges include maintaining data consistency across distributed systems with demanding performance requirements implementing comprehensive data quality frameworks with real-time validation optimizing query performance on large datasets and ensuring complete data lineage and governance across multiple business domains. At Paytm reliable data streams are fundamental to our operations and our commitment to protecting customers financial security and maintaining Indias digital payments infrastructure.
Data Stream Architecture & DevelopmentDesign and implement reliable scalable data streams handling high-volume transaction data with strong data integrity controlsBuild real-time processing systems using modern data engineering frameworks (Java/Python stack) with excellent performance characteristicsDevelop robust data ingestion systems from multiple sources with built-in redundancy and monitoring capabilitiesImplement comprehensive data quality frameworks ensuring the 4 Cs: Completeness Consistency Conformity and Correctness - ensuring data reliability that supports sound business decisionsDesign automated data validation profiling and quality monitoring systems with proactive alerting capabilitiesInfrastructure & Platform ManagementManage and optimize distributed data processing platforms with high availability requirements to ensure consistent service deliveryDesign data lake and data warehouse architectures with appropriate partitioning and indexing strategies for optimal query performanceImplement CI/CD processes for data engineering workflows with comprehensive testing and reliable deployment proceduresEnsure high availability and disaster recovery for critical data systems to maintain business continuity
Performance & OptimizationMonitor and optimize streaming performance with focus on latency reduction and operational efficiencyImplement efficient data storage strategies including compression partitioning and lifecycle management with cost considerationsTroubleshoot and resolve complex data streaming issues in production environments with effective response protocolsConduct proactive capacity planning and performance tuning to support business growth and data volume increases
Collaboration & LeadershipWork closely with data scientists analysts and product teams to understand important data requirements and service level expectationsMentor junior data engineers with emphasis on data quality best practices and customer-focused approachParticipate in architectural reviews and help establish data engineering standards that prioritize reliability and accuracyDocument technical designs processes and operational procedures with focus on maintainability and knowledge sharing
Required Qualifications
Experience & EducationBachelors or Masters degree in Computer Science Engineering or related technical field
7 years (Senior) of hands-on data engineering experience
Proven experience with large-scale data processing systems (preferably in fintech/payments domain)
Experience building and maintaining production data streams processing TB/PB scale data with strong performance and reliability standards
Technical Skills & RequirementsProgramming Languages:
Expert-level proficiency in both Python and Java; experience with Scala preferred
Big Data Technologies: Apache Spark (PySpark Spark SQL Spark with Java) Apache Kafka Apache Airflow
Cloud Platforms: AWS (EMR Glue Redshift S3 Lambda) or equivalent Azure/GCP services
Databases: Strong SQL skills experience with both relational (PostgreSQL MySQL) and NoSQL databases (MongoDB Cassandra Redis)
Data Quality Management: Deep understanding of the 4 Cs framework - Completeness Consistency Conformity and Correctness
Data Governance: Experience with data lineage tracking metadata management and data cataloging
Data Formats & Protocols: Parquet Avro JSON REST APIs GraphQLContainerization & DevOps: Docker Kubernetes Git GitLab/GitHub with CI/CD pipeline experience
Monitoring & Observability: Experience with Prometheus Grafana or similar monitoring tools
Data Modeling: Dimensional modeling data vault or similar methodologies
Streaming Technologies: Apache Flink Kinesis or Pulsar experience is a plus
Infrastructure as Code: Terraform CloudFormation (preferred)
Java-specific: Spring Boot Maven/Gradle JUnit for building robust data services
Preferred Qualifications
Domain Expertise
Previous experience in fintech payments or banking industry with solid understanding of regulatory compliance and financial data requirementsUnderstanding of financial data standards PCI DSS compliance and data privacy regulations where compliance is essential for business operationsExperience with real-time fraud detection or risk management systems where data accuracy is crucial for customer protection
Advanced Technical Skills (Preferred)
Experience building automated data quality frameworks covering all 4 Cs dimensionsKnowledge of machine learning stream orchestration (MLflow Kubeflow)Familiarity with data mesh or federated data architecture patternsExperience with change data capture (CDC) tools and techniques
Leadership & Soft SkillsStrong problem-solving abilities with experience debugging complex distributed systems in production environmentsExcellent communication skills with ability to explain technical concepts to diverse stakeholders while highlighting business valueExperience mentoring team members and leading technical initiatives with focus on building a quality-oriented cultureProven track record of delivering projects successfully in dynamic fast-paced financial technology environments
What We Offer
Opportunity to work with cutting-edge technology at scaleCompetitive salary and equity compensation
Comprehensive health and wellness benefits
Professional development opportunities and conference attendanceFlexible working arrangements
Chance to impact millions of users across Indias digital payments ecosystem
Application Process
Interested candidates should submit:
Updated resume highlighting relevant data engineering experience with emphasis on real-time systems and data quality
Portfolio or GitHub profile showcasing data engineering projects particularly those involving high-throughput streaming systems
Cover letter explaining interest in fintech/payments domain and understanding of data criticality in financial services
References from previous technical managers or senior colleagues who can attest to your data quality standards