Speech AI Engineer

Tokyo - Japan

Monthly Salary: Not Disclosed

Experience Required: 3-5years

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

We are looking for an AI Engineer to join our "AI and Robotics" this role you will work on adding new AI-enabled features to our mobile hardware platforms. The team focuses on improving service efficiency for business partners through technologies such as Classical and Deep Learning-based Computer Vision Automatic Speech Recognition (ASR) Text-to-Speech (TTS) and Retrieval-Augmented Generation (RAG).

As an expert in Speech AI you will handle tasks involving ASR models Voice Activity Detection Language Detection Emotion Detection Speaker Diarization and Audio Cleaning. While our AI codebases are primarily in Python programs running on edge hardware (Jetson boards) are written in C for seamless integration.

Responsibilities

Pipeline Development: Implement end-to-end speech processing pipelines for client-facing projects.
Research: Stay current with the latest achievements and papers in Machine Learning and Speech AI.
Deployment: Write performant scalable code capable of being deployed to a large fleet of remote hardware units.

Requirements

Must-Have Skills:

Programming: Proficiency in Python and solid knowledge of C/C.
DevOps/Tools: Experience with version control (Git) and containerization (Docker or Podman).
Deep Learning Fundamentals: * Architectures: Encoder-Decoder Transformers RNNs.
- Core Concepts: Supervised/Unsupervised training classification regression.
- Evaluation Metrics: WER/CER (Word/Character Error Rate) Cross Entropy.
Speech AI Fundamentals: * Audio preprocessing and Voice Activity Detection (VAD).
- Speaker Diarization.
Specialized Libraries: Proficiency with the HuggingFace ecosystem OpenAI Whisper and NVIDIA NeMo.

Nice-to-Have Skills:

Education: Masters degree in Computer Science or a Deep Learning-related field.
Practical Experience: Deploying ASR systems Emotion Detection or Speaker Diarization in real-world environments.
Advanced ASR Knowledge: Model distillation fine-tuning strategies and specialized evaluation.
Infrastructure: Knowledge of distributed systems cloud computing and high-performance computing (HPC).
Software Engineering: Strong system design testing and debugging fundamentals.
Hardware Acceleration: Familiarity with NVIDIA technologies (CUDA TensorRT Triton Inference Server).
Language: Ability to read/write Japanese.

Benefits

Work Schedule:

Flex Time: 8 hours/day 5 days/week (between 07:00 and 22:00).
Remote Work: 2 days/week remote (up to 4 days based on performance).
Extended Leave: Long holiday policy allowing up to 1 month of continuous leave.

Environment:

Language: Fully English-speaking work environment within the Technology team.
Social: Company-sponsored monthly/quarterly team meals and recreational events (BBQs training camps etc.).

Financial Benefits:

Paid Leave: 15 days annually (cumulative up to 2 years).
Allowances: * Full commuter allowance.
- Housing Allowance Child Allowance Late-night Allowance
Growth: Learning Development Credit Program
Insurance: Comprehensive Health Pension and Employment insurance
Family Support: Maternity and Paternity leave