Were building an AIpowered conversational system for drivethru automation. As our Data Engineer youll design and implement the infrastructure that powers our multistage LLM pipeline from data capture to processing model training and deployment.
Tasks
- Build scalable realtime data pipelines for audio processing LLM interactions and model training
- Design comprehensive data storage solutions across object storage NoSQL and analytical databases
- Implement data quality management with filtering normalization and enrichment capabilities
- Create automated processes for data preparation model evaluation and continuous improvement
- Develop observability systems with monitoring alerting and performance dashboards
- Establish data security and compliance protocols including privacy protection measures
- Build resilient data systems with error recovery backup and integrity verification
Requirements
What Youll Need
- Experience designing data pipelines for AI/ML applications
- Expertise with Apache Airflow for workflow orchestration
- Strong knowledge of Apache Spark for largescale data processing
- Experience with Apache Kafka for realtime event streaming
- Proficiency with object storage systems S3/MinIO and database technologies Cassandra/ScyllaDB ClickHouse
- Understanding of monitoring tools OpenTelemetry and observability platforms
- Experience implementing data security and compliance measures
- Advanced Python programming skills
Preferred Experience
- Audio data processing and conversational AI systems
- LLM training and finetuning pipelines
- Data quality frameworks (Great Expectations) and versioning tools (LakeFS DVC)
- Kubernetes for container orchestration
- Multiregion deployment and distributed systems
Benefits
- Build cuttingedge conversational AI systems with realworld impact
- Work with modern opensource technology stack
- Help shape the future of automated customer service
- Competitive compensation and flexible work arrangements
If youre passionate about building robust data systems for AI applications and excited by complex realtime data challenges wed love to talk.