We are looking for a highly skilled Senior Backend / Data Engineer (Python GCP Vertex AI) to work on pi-sentiment an existing Python-based sentiment analysis and social data pipeline on Google Cloud for a remote Europe-based client. For this role we only process candidates that are based in the Philippines and have legal authorization to work in the Philippines.
About the OTA Client
We build analytics tools for creators influencers and marketers. We pull data from Instagram TikTok Facebook LinkedIn X/Twitter and YouTube run AI-powered sentiment and keyword analysis on it and serve it to users through dashboards. Small team real users real revenue.
The Role
This is not a greenfield role. The codebase patterns components and infrastructure are already in place. Your work will be extending existing features fixing bugs and filling gaps not designing systems from scratch. We need someone who can drop into an unfamiliar codebase figure out how it works by reading the code and start shipping within the first two weeks.
Youll work closely with our Senior Frontend Engineer shipping schema changes API contracts and Supabase tables they consume so you need to be comfortable reading a codebase and reasoning about how your data surfaces in the product.
How We Work
- Autonomy is the default. We point you at an issue and expect you to own it end to end. We dont assign tasks step by step.
- Proactive communication is non-negotiable. If youre stuck say so immediately dont go quiet. A PO and another engineer are available for questions and we expect you to use them.
What Youll Do
Core Responsibilities
- Extend the Sentiment Pipeline: Work within the existing end-to-end flow Supabase RPC Data Scraping ingestion BigQuery Vertex AI Batch sentimentpredictions adding features and fixing bugs without breaking what works
- Add & Maintain Platform Integrations: Extend existing Apify-based adapters across Instagram TikTok Facebook YouTube LinkedIn X/Twitter handling auth rate limits schema drift and backfills
- Ship Cloud Run Jobs: Modify and add containerized Python jobs following existing patterns SIGTERM handling structured logging idempotent retries
- Evolve Data Contracts: Change BigQuery schemas and Supabase tables/RPCs without breaking the frontend; coordinate migrations with the frontend engineer
- Tune Models & Prompts: Iterate on Gemini structured outputs (Pydantic schemas enums) to keep sentiment and keyword extraction accurate across languages and platforms
- Benchmark & Evaluate: Use the existing benchmarking/ suite to compare model configs on cost latency and quality
- Write Tests: Add pytest coverage for your changes unit integration E2E where warranted
What Were Looking For
Required
- Strong Python in production: type hints Pydantic pytest clean module boundaries. Years matter less than evidence show us code youve shipped.
- GCP under load: Cloud Run BigQuery Cloud Storage. Youve operated it not just prototyped.
- SQL that survives review: complex BigQuery or Postgres window functions partitioning query optimization.
- LLM integration in production: youve shipped a feature backed by Vertex AI OpenAI or Anthropic and you know what structured outputs and prompt regressions feel like.
- Cross-stack literacy: you can read a / TypeScript PR understand what data it needs and co-design the contract with our frontend engineer. Writing React is not required.
- Proactive operator: you drive your own work flag blockers fast and dont wait to be assigned. See the Not a fit if... section below we mean it.
Preferred
- Vertex AI Gemini specifically (Batch Prediction structured JSON output with enums)
- Supabase / PostgreSQL with RLS RPCs migrations multi-tenant patterns
- Apify or similar ingestion platforms for social data
- Data pipeline depth: idempotent backfills schema evolution cost engineering (BigQuery slots batch vs. online)
- Docker (multi-stage slim) with Cloud Run parity
- Observability that isnt print() structured logging Cloud Logging Sentry
- Multilingual NLP experience (our comments span many languages)
This Role Is Not a Fit If...
Read this section carefully. If any of these describe you please dont apply youll be unhappy and so will we.
- You need detailed specs for every task. We hand you an issue and a codebase. Figuring out the how is the job. If you need a ticket broken into sub-steps before you can start this isnt the role.
- You wait to be checked in on. Nobody is going to DM you every morning to ask how its going. You drive your own status updates flag slippage early and ask for review when youre ready.
- You go silent when blocked. If youre stuck for more than a few hours and havent said anything thats a problem. Stuck is fine. Quiet is not. A PO and another engineer are one message away use them.
- You expect a long onboarding ramp. You should be opening small PRs in week one and shipping something meaningful by the end of week two. Well help but we wont hand-hold.
Technical Environment
Core Technologies
- Language: Python 3.11 (strict typing Pydantic v2)
- ML / LLM: Vertex AI Gemini (2.5-flash) with structured JSON output
- Cloud: Google Cloud Cloud Run Jobs Cloud Scheduler Cloud Storage BigQuery Vertex AI
- Region: europe-west3 (EU-focused)
Data & Storage
- Analytics warehouse: BigQuery (partitioned clustered)
- Operational DB: Supabase (PostgreSQL with RLS) shared with the frontend
- Ingestion: Apify (15 social platform adapters) Data365 API
- Batch ML: Vertex AI Batch Prediction (JSONL in/out via GCS)
Developer Experience
- Package Manager: uv / pip
- Testing: pytest (unit integration E2E)
- Secrets: dotenvx (encrypted environment files)
- Containers: Docker Cloud Run Jobs
- Version Control: GitHub with trunk-based development
- Monitoring: Cloud Logging Sentry
What the Frontend Looks Like (so you can collaborate)
You wont own this but youll read it and design data for it:
- Framework: 15 (App Router) React 19 TypeScript (strict)
- Data layer: Supabase client TanStack Query
- Auth: Supabase Auth (JWT OAuth RLS)
- Charts/Tables: Visx TanStack Table
What You Get
- Real ML in production: Gemini with real cost latency and quality trade-offs not prototypes
- End-to-end ownership: from ingestion to the Supabase row the frontend reads the whole path is yours
- A small team no silos: one PO one frontend engineer you. Decisions are fast because the room is small.
- Remote and async. We dont care where you work or when as long as you communicate and ship.
- Learning budget for conferences and courses.
Our Engineering Principles
- Type Safety First: Pydantic and type hints catch bugs at the boundary not in production
- Cost-Aware: Batch over online when it fits; measure before scaling
- Observable: Structured logs error tracking and metrics ship with every job
- Trunk-Based Development: Small frequent PRs with feature flags over long-lived branches
Interview Process
- Screen (15 min): Your background what youve shipped why this role.
- Take-home (5-6 hours): Small ingestion BigQuery LLM-enrichment task on GCP. AI-assisted development is fine we care about the decisions not the keystrokes.
- Code walk-through (60 min): Walk us through your solution. Expect pushback on trade-offs.
- Pairing session (60 min): Open a real pi-sentiment issue together. We want to see how you read unfamiliar code and where you ask questions.
- Offer: We move quickly for strong candidates.
We are looking for a highly skilled Senior Backend / Data Engineer (Python GCP Vertex AI) to work on pi-sentiment an existing Python-based sentiment analysis and social data pipeline on Google Cloud for a remote Europe-based client. For this role we only process candidates that are based in the P...
We are looking for a highly skilled Senior Backend / Data Engineer (Python GCP Vertex AI) to work on pi-sentiment an existing Python-based sentiment analysis and social data pipeline on Google Cloud for a remote Europe-based client. For this role we only process candidates that are based in the Philippines and have legal authorization to work in the Philippines.
About the OTA Client
We build analytics tools for creators influencers and marketers. We pull data from Instagram TikTok Facebook LinkedIn X/Twitter and YouTube run AI-powered sentiment and keyword analysis on it and serve it to users through dashboards. Small team real users real revenue.
The Role
This is not a greenfield role. The codebase patterns components and infrastructure are already in place. Your work will be extending existing features fixing bugs and filling gaps not designing systems from scratch. We need someone who can drop into an unfamiliar codebase figure out how it works by reading the code and start shipping within the first two weeks.
Youll work closely with our Senior Frontend Engineer shipping schema changes API contracts and Supabase tables they consume so you need to be comfortable reading a codebase and reasoning about how your data surfaces in the product.
How We Work
- Autonomy is the default. We point you at an issue and expect you to own it end to end. We dont assign tasks step by step.
- Proactive communication is non-negotiable. If youre stuck say so immediately dont go quiet. A PO and another engineer are available for questions and we expect you to use them.
What Youll Do
Core Responsibilities
- Extend the Sentiment Pipeline: Work within the existing end-to-end flow Supabase RPC Data Scraping ingestion BigQuery Vertex AI Batch sentimentpredictions adding features and fixing bugs without breaking what works
- Add & Maintain Platform Integrations: Extend existing Apify-based adapters across Instagram TikTok Facebook YouTube LinkedIn X/Twitter handling auth rate limits schema drift and backfills
- Ship Cloud Run Jobs: Modify and add containerized Python jobs following existing patterns SIGTERM handling structured logging idempotent retries
- Evolve Data Contracts: Change BigQuery schemas and Supabase tables/RPCs without breaking the frontend; coordinate migrations with the frontend engineer
- Tune Models & Prompts: Iterate on Gemini structured outputs (Pydantic schemas enums) to keep sentiment and keyword extraction accurate across languages and platforms
- Benchmark & Evaluate: Use the existing benchmarking/ suite to compare model configs on cost latency and quality
- Write Tests: Add pytest coverage for your changes unit integration E2E where warranted
What Were Looking For
Required
- Strong Python in production: type hints Pydantic pytest clean module boundaries. Years matter less than evidence show us code youve shipped.
- GCP under load: Cloud Run BigQuery Cloud Storage. Youve operated it not just prototyped.
- SQL that survives review: complex BigQuery or Postgres window functions partitioning query optimization.
- LLM integration in production: youve shipped a feature backed by Vertex AI OpenAI or Anthropic and you know what structured outputs and prompt regressions feel like.
- Cross-stack literacy: you can read a / TypeScript PR understand what data it needs and co-design the contract with our frontend engineer. Writing React is not required.
- Proactive operator: you drive your own work flag blockers fast and dont wait to be assigned. See the Not a fit if... section below we mean it.
Preferred
- Vertex AI Gemini specifically (Batch Prediction structured JSON output with enums)
- Supabase / PostgreSQL with RLS RPCs migrations multi-tenant patterns
- Apify or similar ingestion platforms for social data
- Data pipeline depth: idempotent backfills schema evolution cost engineering (BigQuery slots batch vs. online)
- Docker (multi-stage slim) with Cloud Run parity
- Observability that isnt print() structured logging Cloud Logging Sentry
- Multilingual NLP experience (our comments span many languages)
This Role Is Not a Fit If...
Read this section carefully. If any of these describe you please dont apply youll be unhappy and so will we.
- You need detailed specs for every task. We hand you an issue and a codebase. Figuring out the how is the job. If you need a ticket broken into sub-steps before you can start this isnt the role.
- You wait to be checked in on. Nobody is going to DM you every morning to ask how its going. You drive your own status updates flag slippage early and ask for review when youre ready.
- You go silent when blocked. If youre stuck for more than a few hours and havent said anything thats a problem. Stuck is fine. Quiet is not. A PO and another engineer are one message away use them.
- You expect a long onboarding ramp. You should be opening small PRs in week one and shipping something meaningful by the end of week two. Well help but we wont hand-hold.
Technical Environment
Core Technologies
- Language: Python 3.11 (strict typing Pydantic v2)
- ML / LLM: Vertex AI Gemini (2.5-flash) with structured JSON output
- Cloud: Google Cloud Cloud Run Jobs Cloud Scheduler Cloud Storage BigQuery Vertex AI
- Region: europe-west3 (EU-focused)
Data & Storage
- Analytics warehouse: BigQuery (partitioned clustered)
- Operational DB: Supabase (PostgreSQL with RLS) shared with the frontend
- Ingestion: Apify (15 social platform adapters) Data365 API
- Batch ML: Vertex AI Batch Prediction (JSONL in/out via GCS)
Developer Experience
- Package Manager: uv / pip
- Testing: pytest (unit integration E2E)
- Secrets: dotenvx (encrypted environment files)
- Containers: Docker Cloud Run Jobs
- Version Control: GitHub with trunk-based development
- Monitoring: Cloud Logging Sentry
What the Frontend Looks Like (so you can collaborate)
You wont own this but youll read it and design data for it:
- Framework: 15 (App Router) React 19 TypeScript (strict)
- Data layer: Supabase client TanStack Query
- Auth: Supabase Auth (JWT OAuth RLS)
- Charts/Tables: Visx TanStack Table
What You Get
- Real ML in production: Gemini with real cost latency and quality trade-offs not prototypes
- End-to-end ownership: from ingestion to the Supabase row the frontend reads the whole path is yours
- A small team no silos: one PO one frontend engineer you. Decisions are fast because the room is small.
- Remote and async. We dont care where you work or when as long as you communicate and ship.
- Learning budget for conferences and courses.
Our Engineering Principles
- Type Safety First: Pydantic and type hints catch bugs at the boundary not in production
- Cost-Aware: Batch over online when it fits; measure before scaling
- Observable: Structured logs error tracking and metrics ship with every job
- Trunk-Based Development: Small frequent PRs with feature flags over long-lived branches
Interview Process
- Screen (15 min): Your background what youve shipped why this role.
- Take-home (5-6 hours): Small ingestion BigQuery LLM-enrichment task on GCP. AI-assisted development is fine we care about the decisions not the keystrokes.
- Code walk-through (60 min): Walk us through your solution. Expect pushback on trade-offs.
- Pairing session (60 min): Open a real pi-sentiment issue together. We want to see how you read unfamiliar code and where you ask questions.
- Offer: We move quickly for strong candidates.
View more
View less