Senior Data Engineer (OTT App)
Were seeking an experienced Data Engineer to join our platform team handling massivescale data processing and analytics infrastructure that supports over 650M daily events. The ideal candidate will bridge the gap between raw data collection and actionable insights while supporting our ML initiatives.
Key Responsibilities
- Design build and maintain scalable data pipelines to process 650M daily events from our platform
- Architect and implement data lakes and warehouses to support realtime analytics and ML model training
- Ensure data quality reliability and accessibility across the organization
- Collaborate with ML teams to create efficient data pipelines for model training and inference
- Optimize data infrastructure for performance cost and reliability
- Implement data governance and security best practices
Required Technical Skills
- 3 years of experience in data engineering with proven experience handling largescale data processing 100M daily events)
- Expertise in modern data processing technologies:
- Apache Spark for largescale data processing
- Apache Kafka or similar streaming platforms
- Apache Airflow or similar workflow orchestration tools
- Cloud data warehouses (Snowflake BigQuery or Redshift)
- Strong programming skills in Python and SQL
- Experience with cloud platforms (AWS/GCP/Azure)
- Knowledge of data modeling and ETL best practices
Preferred Skills
- Experience with realtime streaming analytics
- Familiarity with ML operations and ML pipelines
- Knowledge of video streaming/OTT domain
- Experience with data quality monitoring tools
- Understanding of data privacy regulations and compliance
- Contribution to opensource data tools
Preferred Tools & Technologies
- Data Processing: Apache Spark Apache Flink
- Stream Processing: Apache Kafka Apache Pulsar
- Workflow Orchestration: Apache Airflow Dagster
- Data Warehousing: Snowflake BigQuery
- Data Lakes: Delta Lake Apache Iceberg
- ML Tools: MLflow Kubeflow
- Infrastructure: Docker Kubernetes
- Monitoring: Prometheus Grafana
- Version Control: Git
Education
- Bachelors/Masters degree in Computer Science Data Science or related field
- Relevant professional certifications in cloud platforms or data technologies
What We Offer
- Opportunity to work with cuttingedge data technologies at scale
- Building DE pipelines from ground up and complete ownership of pipeline
- Competitive compensation package
- Professional development and certification support
- Health insurance and other benefits
- Collaborative and innovative work environment
Success Metrics
- Ability to design and implement data pipelines handling 500M daily events
- Reduce data processing latency and improve system reliability
- Successfully support ML initiatives with clean reliable data
- Contribute to data architecture decisions and best practices
- Mentor team members and promote data engineering best practices
The ideal candidate will combine technical expertise with strong problemsolving skills and the ability to work in a fastpaced environment. They should be passionate about building scalable data solutions and have a track record of successfully handling largescale data processing challenges.
snowflake,analytics,apache airflow,apache kafka,python,redshift,data processing,aws,gcp,apache spark,sql,azure,apache,bigquery