Senior BackendData Engineer

Hamburg - Germany

Monthly Salary: Not Disclosed

Posted on: 4 days ago

Vacancies: 1 Vacancy

Job Summary

We are looking for a highly skilled Senior Backend / Data Engineer (Python GCP Vertex AI) to work on pi-sentiment an existing Python-based sentiment analysis and social data pipeline on Google Cloud for a remote Europe-based client. For this role we only process candidates that are based in the Philippines and have legal authorization to work in the Philippines.

About the OTA Client

We build analytics tools for creators influencers and marketers. We pull data from Instagram TikTok Facebook LinkedIn X/Twitter and YouTube run AI-powered sentiment and keyword analysis on it and serve it to users through dashboards. Small team real users real revenue.

The Role

This is not a greenfield role. The codebase patterns components and infrastructure are already in place. Your work will be extending existing features fixing bugs and filling gaps not designing systems from scratch. We need someone who can drop into an unfamiliar codebase figure out how it works by reading the code and start shipping within the first two weeks.

Youll work closely with our Senior Frontend Engineer shipping schema changes API contracts and Supabase tables they consume so you need to be comfortable reading a codebase and reasoning about how your data surfaces in the product.

How We Work

Autonomy is the default. We point you at an issue and expect you to own it end to end. We dont assign tasks step by step.

- Proactive communication is non-negotiable. If youre stuck say so immediately dont go quiet. A PO and another engineer are available for questions and we expect you to use them.

What Youll Do

Core Responsibilities

Extend the Sentiment Pipeline: Work within the existing end-to-end flow Supabase RPC Data Scraping ingestion BigQuery Vertex AI Batch sentimentpredictions adding features and fixing bugs without breaking what works

Add & Maintain Platform Integrations: Extend existing Apify-based adapters across Instagram TikTok Facebook YouTube LinkedIn X/Twitter handling auth rate limits schema drift and backfills

Ship Cloud Run Jobs: Modify and add containerized Python jobs following existing patterns SIGTERM handling structured logging idempotent retries

Evolve Data Contracts: Change BigQuery schemas and Supabase tables/RPCs without breaking the frontend; coordinate migrations with the frontend engineer

Tune Models & Prompts: Iterate on Gemini structured outputs (Pydantic schemas enums) to keep sentiment and keyword extraction accurate across languages and platforms

Benchmark & Evaluate: Use the existing benchmarking/ suite to compare model configs on cost latency and quality

Write Tests: Add pytest coverage for your changes unit integration E2E where warranted

What Were Looking For

Required

Strong Python in production: type hints Pydantic pytest clean module boundaries. Years matter less than evidence show us code youve shipped.

GCP under load: Cloud Run BigQuery Cloud Storage. Youve operated it not just prototyped.

SQL that survives review: complex BigQuery or Postgres window functions partitioning query optimization.

LLM integration in production: youve shipped a feature backed by Vertex AI OpenAI or Anthropic and you know what structured outputs and prompt regressions feel like.

Cross-stack literacy: you can read a / TypeScript PR understand what data it needs and co-design the contract with our frontend engineer. Writing React is not required.

Proactive operator: you drive your own work flag blockers fast and dont wait to be assigned. See the Not a fit if... section below we mean it.

Preferred

Vertex AI Gemini specifically (Batch Prediction structured JSON output with enums)

Supabase / PostgreSQL with RLS RPCs migrations multi-tenant patterns

Apify or similar ingestion platforms for social data

Data pipeline depth: idempotent backfills schema evolution cost engineering (BigQuery slots batch vs. online)

Docker (multi-stage slim) with Cloud Run parity

Observability that isnt print() structured logging Cloud Logging Sentry

GitHub Actions for CI/CD

Multilingual NLP experience (our comments span many languages)

Terraform / IaC for GCP

This Role Is Not a Fit If...

Read this section carefully. If any of these describe you please dont apply youll be unhappy and so will we.

You need detailed specs for every task. We hand you an issue and a codebase. Figuring out the how is the job. If you need a ticket broken into sub-steps before you can start this isnt the role.

You wait to be checked in on. Nobody is going to DM you every morning to ask how its going. You drive your own status updates flag slippage early and ask for review when youre ready.

You go silent when blocked. If youre stuck for more than a few hours and havent said anything thats a problem. Stuck is fine. Quiet is not. A PO and another engineer are one message away use them.

You expect a long onboarding ramp. You should be opening small PRs in week one and shipping something meaningful by the end of week two. Well help but we wont hand-hold.

Technical Environment

Core Technologies

Language: Python 3.11 (strict typing Pydantic v2)

ML / LLM: Vertex AI Gemini (2.5-flash) with structured JSON output

Cloud: Google Cloud Cloud Run Jobs Cloud Scheduler Cloud Storage BigQuery Vertex AI

Region: europe-west3 (EU-focused)

Data & Storage

Analytics warehouse: BigQuery (partitioned clustered)

Operational DB: Supabase (PostgreSQL with RLS) shared with the frontend

Ingestion: Apify (15 social platform adapters) Data365 API

Batch ML: Vertex AI Batch Prediction (JSONL in/out via GCS)

Developer Experience

Package Manager: uv / pip

Formatting: black

Linting: flake8

Testing: pytest (unit integration E2E)

Secrets: dotenvx (encrypted environment files)

Containers: Docker Cloud Run Jobs

Version Control: GitHub with trunk-based development

Monitoring: Cloud Logging Sentry

What the Frontend Looks Like (so you can collaborate)

You wont own this but youll read it and design data for it:

Framework: 15 (App Router) React 19 TypeScript (strict)

Data layer: Supabase client TanStack Query

Auth: Supabase Auth (JWT OAuth RLS)

Charts/Tables: Visx TanStack Table

What You Get

Real ML in production: Gemini with real cost latency and quality trade-offs not prototypes

End-to-end ownership: from ingestion to the Supabase row the frontend reads the whole path is yours

A small team no silos: one PO one frontend engineer you. Decisions are fast because the room is small.

Remote and async. We dont care where you work or when as long as you communicate and ship.

Learning budget for conferences and courses.

Our Engineering Principles

Type Safety First: Pydantic and type hints catch bugs at the boundary not in production

Cost-Aware: Batch over online when it fits; measure before scaling

Observable: Structured logs error tracking and metrics ship with every job

Trunk-Based Development: Small frequent PRs with feature flags over long-lived branches

Interview Process

Screen (15 min): Your background what youve shipped why this role.

Take-home (5-6 hours): Small ingestion BigQuery LLM-enrichment task on GCP. AI-assisted development is fine we care about the decisions not the keystrokes.

Code walk-through (60 min): Walk us through your solution. Expect pushback on trade-offs.

Pairing session (60 min): Open a real pi-sentiment issue together. We want to see how you read unfamiliar code and where you ask questions.

Offer: We move quickly for strong candidates.

We are looking for a highly skilled Senior Backend / Data Engineer (Python GCP Vertex AI) to work on pi-sentiment an existing Python-based sentiment analysis and social data pipeline on Google Cloud for a remote Europe-based client. For this role we only process candidates that are based in the P...