As a Senior Platform Engineer you will lead the design construction and operation of a robust infrastructure that supports core AI capabilities and high-security enterprise solutions. You will bridge the gap between AI engineering and reliable systems operations ensuring that development teams can deploy advanced models and applications with maximum efficiency security and scalability.
Key Responsibilities
AI Infrastructure & LLMOps: Develop and maintain a flexible LLMOps platform ensuring reliability and staying current with the latest advancements in Large Language Models (LLMs).
Enterprise Solution Architecture: Support infrastructure design for security reliability and availability specifically tailored for enterprise-level clients.
Automation & Optimization: Lead the automation of deployment pipelines and various operations using CI/CD tools to enhance development efficiency.
Pipeline Management: Optimize data processing and model training pipelines in collaboration with AI engineers and data scientists.
Systems Reliability: Build and operate monitoring environments for fault detection and capacity planning to improve overall service reliability.
Cross-Functional Leadership: Collaborate with SREs AI engineers domain experts and product teams to promote technical coordination and best practices
Requirements
Must-Have Skills:
Experience: 5 years of professional software development experience with at least 3 years focused on backend or infrastructure systems.
Cloud & Containers: Proven experience in cloud architecture (AWS GCP or Azure) and container orchestration using Kubernetes.
CI/CD & DevOps: Practical experience building and operating CI/CD pipelines and utilizing Infrastructure as Code (IaC).
AI/LLMOps Principles: Practical knowledge of LLMOps principles and the tools required to manage AI life cycles.
Engineering Excellence: Deep understanding of software engineering best practices including rigorous testing code reviews and performance optimization.
Communication: Strong problem-solving and communication skills in Japanese (Business-level/N2 equivalent) and preferably English.
Nice-to-Have Skills:
Experience with machine learning ecosystems like TensorFlow or PyTorch.
Proficiency in Python Rust or Java.
Experience in enabling platforms for large engineering teams (50 members).
Experience with advanced monitoring tools like Datadog or Grafana.
Benefits
Salary & Financials:
Work-Life Balance:
Flexibility: Full-flex or super-flex systems with core hours (e.g. 11:0016:00) or no core hours at all.
Remote Work: Hybrid or remote-first work styles with occasional office meetings for team alignment.
Holidays: 120 days off annually including weekends national holidays New Years and special leave (sick leave birthday and refresh leave).
Professional Growth & Support:
Tooling: Access to premium AI tools (ChatGPT Enterprise Cursor GitHub Copilot) and high-spec hardware.
Allowances: * Learning: Monthly budget for server costs (up to 10000 JPY) books and language learning.
Additional: Side-jobs are permitted (with prior approval)
As a Senior Platform Engineer you will lead the design construction and operation of a robust infrastructure that supports core AI capabilities and high-security enterprise solutions. You will bridge the gap between AI engineering and reliable systems operations ensuring that development teams can d...
As a Senior Platform Engineer you will lead the design construction and operation of a robust infrastructure that supports core AI capabilities and high-security enterprise solutions. You will bridge the gap between AI engineering and reliable systems operations ensuring that development teams can deploy advanced models and applications with maximum efficiency security and scalability.
Key Responsibilities
AI Infrastructure & LLMOps: Develop and maintain a flexible LLMOps platform ensuring reliability and staying current with the latest advancements in Large Language Models (LLMs).
Enterprise Solution Architecture: Support infrastructure design for security reliability and availability specifically tailored for enterprise-level clients.
Automation & Optimization: Lead the automation of deployment pipelines and various operations using CI/CD tools to enhance development efficiency.
Pipeline Management: Optimize data processing and model training pipelines in collaboration with AI engineers and data scientists.
Systems Reliability: Build and operate monitoring environments for fault detection and capacity planning to improve overall service reliability.
Cross-Functional Leadership: Collaborate with SREs AI engineers domain experts and product teams to promote technical coordination and best practices
Requirements
Must-Have Skills:
Experience: 5 years of professional software development experience with at least 3 years focused on backend or infrastructure systems.
Cloud & Containers: Proven experience in cloud architecture (AWS GCP or Azure) and container orchestration using Kubernetes.
CI/CD & DevOps: Practical experience building and operating CI/CD pipelines and utilizing Infrastructure as Code (IaC).
AI/LLMOps Principles: Practical knowledge of LLMOps principles and the tools required to manage AI life cycles.
Engineering Excellence: Deep understanding of software engineering best practices including rigorous testing code reviews and performance optimization.
Communication: Strong problem-solving and communication skills in Japanese (Business-level/N2 equivalent) and preferably English.
Nice-to-Have Skills:
Experience with machine learning ecosystems like TensorFlow or PyTorch.
Proficiency in Python Rust or Java.
Experience in enabling platforms for large engineering teams (50 members).
Experience with advanced monitoring tools like Datadog or Grafana.
Benefits
Salary & Financials:
Work-Life Balance:
Flexibility: Full-flex or super-flex systems with core hours (e.g. 11:0016:00) or no core hours at all.
Remote Work: Hybrid or remote-first work styles with occasional office meetings for team alignment.
Holidays: 120 days off annually including weekends national holidays New Years and special leave (sick leave birthday and refresh leave).
Professional Growth & Support:
Tooling: Access to premium AI tools (ChatGPT Enterprise Cursor GitHub Copilot) and high-spec hardware.
Allowances: * Learning: Monthly budget for server costs (up to 10000 JPY) books and language learning.
Additional: Side-jobs are permitted (with prior approval)
View more
View less