About Us
At Qloo we harness large-scale behavioral and catalog data to power recommendations and insights across entertainment dining travel retail and more. Our platform is built on a modern AWS data stack and supports analytics APIs and machine-learning models used by leading brands. We are looking for an experienced Data Engineer to help evolve and scale this platform.
Role Overview
As a Data Engineer at Qloo you will design build and operate the pipelines that move data from external vendors internal systems and public sources into our S3-based data lake and downstream services. Youll work across AWS Glue EMR (Spark) Athena/Hive and Airflow (MWAA) to ensure that our data is accurate well-modeled and efficiently accessible for analytics indexing and machine-learning workloads.
You should be comfortable owning end-to-end data flows from ingestion and transformation to quality checks monitoring and performance tuning.
Responsibilities
- Design develop and maintain batch data pipelines using Python Spark (EMR) and AWS Glue loading data from S3 RDS and external sources into Hive/Athena tables.
- Model datasets in our S3/Hive data lake to support analytics (Hex) API use cases Elasticsearch indexes and ML models.
- Implement and operate workflows in Airflow (MWAA) including dependency management scheduling retries and alerting via Slack.
- Build robust data quality and validation checks (schema validation freshness/volume checks anomaly detection) and ensure issues are surfaced quickly with monitoring and alerts.
- Optimize jobs for cost and performance (partitioning file formats join strategies proper use of EMR/Glue resources).
- Collaborate closely with data scientists ML engineers and application engineers to understand data requirements and design schemas and pipelines that serve multiple use cases.
- Contribute to internal tooling and shared libraries that make working with our data platform faster safer and more consistent.
- Document pipelines datasets and best practices so the broader team can easily understand and work with our data.
Qualifications
- Bachelors degree in Computer Science Software Engineering or a related field or equivalent practical experience.
- Experience with Python and distributed data processing using Spark (PySpark) on EMR or a similar environment.
- Hands-on experience with core AWS data services ideally including:
- S3 (data lake partitioning lifecycle management)
- AWS Glue (jobs crawlers catalogs)
- EMR or other managed Spark platforms
- Athena/Hive and SQL for querying large datasets
- Relational databases such as RDS (PostgreSQL/MySQL or similar)
- Experience building and operating workflows in Airflow (MWAA experience is a plus).
- Strong SQL skills and familiarity with data modeling concepts for analytics and APIs.
- Solid understanding of data quality practices (testing validation frameworks monitoring/observability).
- Comfortable working in a collaborative environment managing multiple projects and owning systems end-to-end.
We Offer
- Competitive salary and benefits package including health insurance retirement plan and paid time off.
- The opportunity to shape a modern cloud-based data platform that powers real products and ML experiences.
- A collaborative low-ego work environment where your ideas are valued and your contributions are visible.
- Flexible work arrangements (remote and hybrid options) and a healthy respect for work-life balance.