Data Engineer (Full-time)

Kata.ai

Job Location:

Others - Indonesia

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Design build and maintain scalable data pipelines streaming infrastructure and AI/ML data workflows that power data-driven products and enterprise AI solutions ensuring reliable timely and high-quality data is available across the organization so that AI Engineers Product teams and enterprise clients can make accurate insight-driven decisions and deliver intelligent customer experiences through Katas AI and voice platforms.

Qualifications :

Qualifications & Education :

Bachelors degree in Computer Science Information Systems Data Engineering Statistics or related field
Relevant certifications (GCP Professional Data Engineer Databricks Airflow/Astronomer etc.) are a plus

Technical Skills :

Streaming: Apache Kafka topic design consumer groups partitioning strategy and real-time event processing
Batch Orchestration: Apache Airflow DAG design scheduling dependency management and failure handling
Distributed Processing: Apache Spark batch and micro-batch transformations DataFrame API optimization
Data Warehousing: Google BigQuery (primary); Apache Hive for large-scale batch analytics
NoSQL / Wide-Column: Apache Cassandra data modeling for high-write time-series and event-driven workloads
Languages: Python (required); SQL (required); Scala is a plus
Cloud: GCP BigQuery Dataflow Cloud Storage Pub/Sub Vertex AI Pipelines; Azure is a plus
Containerization: Docker; basic Kubernetes for deploying data services
CI/CD: GitLab CI GitHub Actions or equivalent for pipeline deployment automation
Data Quality: Great Expectations dbt tests or custom validation frameworks
Monitoring: Prometheus Grafana or GCP Monitoring for pipeline observability; alerting on SLA breaches
Version Control: Git with feature branching and pull request workflow

Experience :

Associate Level (12 years)

12 years of professional experience in data engineering software engineering with data focus or a related technical role
Hands-on experience building or maintaining data pipelines in a production environment
Practical exposure to at least one streaming or batch processing technology (Kafka Spark or Airflow)
Familiarity with SQL and relational or columnar databases (BigQuery PostgreSQL Hive or equivalent)
Exposure to cloud data services on GCP or Azure
Experience working in Agile/Scrum teams with sprint-based delivery

Mid Level (35 years)

35 years of professional experience in data engineering with at least 2 years building and operating production-grade pipelines
Proven hands-on experience with Apache Kafka for real-time event streaming including topic design consumer group management and at-least-once/exactly-once delivery patterns
Demonstrated experience designing and maintaining batch workflows using Apache Airflow and large-scale data transformations with Apache Spark
Experience working with BigQuery and/or Hive for large-scale analytics workloads including query optimization and partitioning strategies
Hands-on experience with Cassandra or similar NoSQL wide-column stores for high-write or time-series data use cases
Experience supporting AI/ML data pipelines feature engineering training dataset preparation or model inference data feeds
Experience with data quality frameworks and implementing data observability practices in production environments

Additional Information :

We value a flexible working hour for our employees.

The most important is we provide a learning experience in Conversational AI Industry.

Remote Work :

Yes

Employment Type :

Full-time

Qualifications :

Qualifications & Education :

Bachelors degree in Computer Science Information Systems Data Engineering Statistics or related field
Relevant certifications (GCP Professional Data Engineer Databricks Airflow/Astronomer etc.) are a plus

Technical Skills :

Streaming: Apache Kafka topic design consumer groups partitioning strategy and real-time event processing
Batch Orchestration: Apache Airflow DAG design scheduling dependency management and failure handling
Distributed Processing: Apache Spark batch and micro-batch transformations DataFrame API optimization
Data Warehousing: Google BigQuery (primary); Apache Hive for large-scale batch analytics
NoSQL / Wide-Column: Apache Cassandra data modeling for high-write time-series and event-driven workloads
Languages: Python (required); SQL (required); Scala is a plus
Cloud: GCP BigQuery Dataflow Cloud Storage Pub/Sub Vertex AI Pipelines; Azure is a plus
Containerization: Docker; basic Kubernetes for deploying data services
CI/CD: GitLab CI GitHub Actions or equivalent for pipeline deployment automation
Data Quality: Great Expectations dbt tests or custom validation frameworks
Monitoring: Prometheus Grafana or GCP Monitoring for pipeline observability; alerting on SLA breaches
Version Control: Git with feature branching and pull request workflow

Experience :

Associate Level (12 years)

12 years of professional experience in data engineering software engineering with data focus or a related technical role
Hands-on experience building or maintaining data pipelines in a production environment
Practical exposure to at least one streaming or batch processing technology (Kafka Spark or Airflow)
Familiarity with SQL and relational or columnar databases (BigQuery PostgreSQL Hive or equivalent)
Exposure to cloud data services on GCP or Azure
Experience working in Agile/Scrum teams with sprint-based delivery

Mid Level (35 years)

35 years of professional experience in data engineering with at least 2 years building and operating production-grade pipelines
Proven hands-on experience with Apache Kafka for real-time event streaming including topic design consumer group management and at-least-once/exactly-once delivery patterns
Demonstrated experience designing and maintaining batch workflows using Apache Airflow and large-scale data transformations with Apache Spark
Experience working with BigQuery and/or Hive for large-scale analytics workloads including query optimization and partitioning strategies
Hands-on experience with Cassandra or similar NoSQL wide-column stores for high-write or time-series data use cases
Experience supporting AI/ML data pipelines feature engineering training dataset preparation or model inference data feeds
Experience with data quality frameworks and implementing data observability practices in production environments

Additional Information :

We value a flexible working hour for our employees.

The most important is we provide a learning experience in Conversational AI Industry.

Remote Work :

Yes

Employment Type :

Full-time

Key Skills

Apply Now

About Company

Kata.ai

Kata.ai is an Indonesian Conversational Artificial Intelligence company with a focus on creating technology to enhance the understanding of human conversations, improving the way humans collaborate with machines. Kata.ais Natural Language Processing (NLP) technology powers MultiPurpo ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click