About us:
TransPerfect a recognized leader in translation software with a vibrant start-up spirit is seeking a creative and passionate Backend Developer to join our innovative Artificial Intelligence (AI) team. As part of this division you will have the opportunity to shape the future of AI in a global organization. From its beginnings over 10 years ago and the creation of its first machine translation models the AI team has become a core driver of the companys innovation in machine translation generative AI natural language processing and automation.
We are looking for an experienced backend developer who is excited about pushing the boundaries of technology and making a lasting impact within the AI space. You will be part of a diverse global team of professionals across the USA Spain Portugal and India. If you are passionate about robust and scalable solutions that bring AI to users this is the role for you.
About the Role:
As a Backend Developer you will help us solve the last mile of document processing: converting complex unstructured PDFs into perfectly formatted editable .docx files. The goal is not just to extract text but to recreate the visual and structural intent of the original documentincluding nested tables multi-column layouts font hierarchies and styling.
You will lead the research and implementation of our document conversion pipeline. This is a hybrid role requiring you to be both a strategic decision-maker (staying on top of the existing tools) and a hands-on developer (combining engineering and AI skills).
You will be in charge of:
Comparative Analysis: Perform a deep-dive evaluation of commercial (ABBYY Adobe AWS Textract) vs. open-source/AI-native (Mistral OCR Docling Nougat LlamaParse) solutions.
Benchmarking: Establish metrics for format fidelity to objectively measure how well a tool recreates headers footers tables and styles.
Pipeline Development: Build a Python-based workflow that integrates OCR engines with document generation libraries (like python-docx or Pandoc).
AI Implementation: Explore and fine-tune Vision-Language Models (VLMs) or LayoutLM-style architectures to improve structural recognition.
Optimization: Solve specific edge cases such as rotated text low-resolution scans and complex mathematical notation.
Technical Requirements
Python Mastery: Expert-level Python skills with experience in OpenCV PyMuPDF and python-docx.
OCR/Document AI: Deep familiarity with Tesseract PaddleOCR and modern Transformer-based document models (LayoutLMv3 Donut or Nougat).
Format Expertise: A pixel-perfect mindsetunderstanding the nuances of XML-based document formats (OOXML).
LLM Integration: Experience using GPT or Claude models for layout correction and semantic cleanup.
Architectural Vision: Ability to decide when to use an off-the-shelf API versus when to build a custom PyTorch/TensorFlow pipeline.
Nice to Have
Experience with Pandoc AST (Abstract Syntax Trees) for format conversion.
Background in DTP Typography or Graphic Design.
Contributions to open-source OCR or PDF manipulation projects.
Your application has been successfully submitted!
Required Experience:
IC
TransPerfect Translations is a translation, E-Discovery and language services company based in New York City. The company serves clients in many fields, such as film, gaming, legal and healthcare fields. As of 2012, TransPerfect is "the largest privately owned language services provid ... View more