Design build and maintain scalable data pipelines streaming infrastructure and AI/ML data workflows that power data-driven products and enterprise AI solutions ensuring reliable timely and high-quality data is available across the organization so that AI Engineers Product teams and enterprise clients can make accurate insight-driven decisions and deliver intelligent customer experiences through Katas AI and voice platforms.
Qualifications :
Qualifications & Education :
- Bachelors degree in Computer Science Information Systems Data Engineering Statistics or related field
- Relevant certifications (GCP Professional Data Engineer Databricks Airflow/Astronomer etc.) are a plus
Technical Skills :
- Streaming: Apache Kafka topic design consumer groups partitioning strategy and real-time event processing
- Batch Orchestration: Apache Airflow DAG design scheduling dependency management and failure handling
- Distributed Processing: Apache Spark batch and micro-batch transformations DataFrame API optimization
- Data Warehousing: Google BigQuery (primary); Apache Hive for large-scale batch analytics
- NoSQL / Wide-Column: Apache Cassandra data modeling for high-write time-series and event-driven workloads
- Languages: Python (required); SQL (required); Scala is a plus
- Cloud: GCP BigQuery Dataflow Cloud Storage Pub/Sub Vertex AI Pipelines; Azure is a plus
- Containerization: Docker; basic Kubernetes for deploying data services
- CI/CD: GitLab CI GitHub Actions or equivalent for pipeline deployment automation
- Data Quality: Great Expectations dbt tests or custom validation frameworks
- Monitoring: Prometheus Grafana or GCP Monitoring for pipeline observability; alerting on SLA breaches
- Version Control: Git with feature branching and pull request workflow
Experience :
Associate Level (12 years)
- 12 years of professional experience in data engineering software engineering with data focus or a related technical role
- Hands-on experience building or maintaining data pipelines in a production environment
- Practical exposure to at least one streaming or batch processing technology (Kafka Spark or Airflow)
- Familiarity with SQL and relational or columnar databases (BigQuery PostgreSQL Hive or equivalent)
- Exposure to cloud data services on GCP or Azure
- Experience working in Agile/Scrum teams with sprint-based delivery
Mid Level (35 years)
- 35 years of professional experience in data engineering with at least 2 years building and operating production-grade pipelines
- Proven hands-on experience with Apache Kafka for real-time event streaming including topic design consumer group management and at-least-once/exactly-once delivery patterns
- Demonstrated experience designing and maintaining batch workflows using Apache Airflow and large-scale data transformations with Apache Spark
- Experience working with BigQuery and/or Hive for large-scale analytics workloads including query optimization and partitioning strategies
- Hands-on experience with Cassandra or similar NoSQL wide-column stores for high-write or time-series data use cases
- Experience supporting AI/ML data pipelines feature engineering training dataset preparation or model inference data feeds
- Experience with data quality frameworks and implementing data observability practices in production environments
Additional Information :
We value a flexible working hour for our employees.
The most important is we provide a learning experience in Conversational AI Industry.
Remote Work :
Yes
Employment Type :
Full-time
Design build and maintain scalable data pipelines streaming infrastructure and AI/ML data workflows that power data-driven products and enterprise AI solutions ensuring reliable timely and high-quality data is available across the organization so that AI Engineers Product teams and enterprise clie...
Design build and maintain scalable data pipelines streaming infrastructure and AI/ML data workflows that power data-driven products and enterprise AI solutions ensuring reliable timely and high-quality data is available across the organization so that AI Engineers Product teams and enterprise clients can make accurate insight-driven decisions and deliver intelligent customer experiences through Katas AI and voice platforms.
Qualifications :
Qualifications & Education :
- Bachelors degree in Computer Science Information Systems Data Engineering Statistics or related field
- Relevant certifications (GCP Professional Data Engineer Databricks Airflow/Astronomer etc.) are a plus
Technical Skills :
- Streaming: Apache Kafka topic design consumer groups partitioning strategy and real-time event processing
- Batch Orchestration: Apache Airflow DAG design scheduling dependency management and failure handling
- Distributed Processing: Apache Spark batch and micro-batch transformations DataFrame API optimization
- Data Warehousing: Google BigQuery (primary); Apache Hive for large-scale batch analytics
- NoSQL / Wide-Column: Apache Cassandra data modeling for high-write time-series and event-driven workloads
- Languages: Python (required); SQL (required); Scala is a plus
- Cloud: GCP BigQuery Dataflow Cloud Storage Pub/Sub Vertex AI Pipelines; Azure is a plus
- Containerization: Docker; basic Kubernetes for deploying data services
- CI/CD: GitLab CI GitHub Actions or equivalent for pipeline deployment automation
- Data Quality: Great Expectations dbt tests or custom validation frameworks
- Monitoring: Prometheus Grafana or GCP Monitoring for pipeline observability; alerting on SLA breaches
- Version Control: Git with feature branching and pull request workflow
Experience :
Associate Level (12 years)
- 12 years of professional experience in data engineering software engineering with data focus or a related technical role
- Hands-on experience building or maintaining data pipelines in a production environment
- Practical exposure to at least one streaming or batch processing technology (Kafka Spark or Airflow)
- Familiarity with SQL and relational or columnar databases (BigQuery PostgreSQL Hive or equivalent)
- Exposure to cloud data services on GCP or Azure
- Experience working in Agile/Scrum teams with sprint-based delivery
Mid Level (35 years)
- 35 years of professional experience in data engineering with at least 2 years building and operating production-grade pipelines
- Proven hands-on experience with Apache Kafka for real-time event streaming including topic design consumer group management and at-least-once/exactly-once delivery patterns
- Demonstrated experience designing and maintaining batch workflows using Apache Airflow and large-scale data transformations with Apache Spark
- Experience working with BigQuery and/or Hive for large-scale analytics workloads including query optimization and partitioning strategies
- Hands-on experience with Cassandra or similar NoSQL wide-column stores for high-write or time-series data use cases
- Experience supporting AI/ML data pipelines feature engineering training dataset preparation or model inference data feeds
- Experience with data quality frameworks and implementing data observability practices in production environments
Additional Information :
We value a flexible working hour for our employees.
The most important is we provide a learning experience in Conversational AI Industry.
Remote Work :
Yes
Employment Type :
Full-time
View more
View less