Forward-Deployed Cheminformatician

Apheris


Job Location:

Berlin - Germany

Monthly Salary: Not Disclosed
Posted on: 3 days ago
Vacancies: 1 Vacancy

Job Summary

About Apheris

AtApheris we are building the future of how AI is applied in pharmaceutical R&D.
We enable leading pharmaceutical teams to discover and develop drugs faster. We host the industrys largest federated data networks for drug discovery AI spanning co-folding ADMET and antibody developability.
Across these networks models are trained on proprietary industry datasets to achieve higher performance and broader applicability while keeping data control and IP protected. We deliver these superior models through drug discovery applications that enable teams to run them at scale further customize them and integrate them into existing R&D workflows.
  • AI Structural Biology (AISB) Network:Pharmaceutical companies collaborate in the field of co-folding structure-based binding affinitypredictionsand antibody design.
  • ADMET Network:Pharmaceutical and biotech companies collaborate to improve small-molecule property prediction and expandinto further drug modalities.
  • Antibody Developability Network:Pharma partners collaborate to federate historical and purpose-built antibodydevelopabilitydatasets for secure ML training without data leaving each partners environment.

About the role

We are looking for aForward-DeployedCheminformaticianto own how binding data is prepared across ourco-folding focused networks and data is the input that decides whether our co-folding and binding-affinity models perform in real drug programs. It arrives from pharma partners in heterogeneous shapes different assay registries different metadata different chemical-representation standards different choices on qualifiersreplicatesand censoring. We need someone who turns this into a repeatable well-documented preparation pipeline that pharma representatives can run alongside us and that scales to the public-data corpus we build for our own model training.
This is half engineering half forward-deployed work. You will define the protocol harden it with validators and scripts integrate it into the Apheris products run it with each new partner and own the equivalent pipeline for the public binding-data corpus.

What you will do

  • Define and own the binding-data preparation protocol data schema small-molecule standardization assay metadata model value handling (KD Ki IC50 pIC50) qualifier and censored-value handlingduplicateand replicate aggregation.
  • Build the tooling that runs it modular scripts validators with actionable errors and reusable pipelines that survive different pharma upstream systems (Dotmatics Spotfire in-house registries).
  • Workforward-deployedwith pharma. Sit with their biologists and medicinal chemists walk them through the protocol sense-check what an assay columnactually measures and unblock retrieval.
  • Maintain the small-molecule representation pipeline RDKitstandardization tautomer and ionization handling stereochemistry preservationandPAINS / frequent-hitter filtering.
  • Curate the public binding-data foundation ChEMBLBindingDB PubChemBioAssay prepared to the same standard so our models train on the strongest public baseline anyone can assemble.
  • Hand the productized pipeline cleanly toengineering for scaling and partner with ML to keep the data contractvalidasmodels and networks evolve.

What we expect from you

You should apply if:
  • You have a BSc MSc PhD or equivalent in cheminformatics computational chemistry or a related field plus 3 years preparing biological assay data in a discovery setting.
  • You are fluent in Python andRDKit. SMILES normalization tautomer / ionization / stereochemistry handling and scaffold extraction are second nature and you understand why eachmattersfor activity cliffs and model training.
  • You have hands-on experience curating quantitative binding assay data (KD Ki IC50 pIC50) and HTS data censored values qualifiers duplicates replicate aggregation and assay metadata interpretation.
  • You write good engineering code version control tested modular scripts validators that return useful errors.
  • You are comfortable forward-deployed with pharma medicinal chemists and biologists. You can sit in a sense-check meeting pull out what isactually meantby a column label and encode that back into the protocol.
  • You enjoy turning a messy ad-hoc cleaning job into a repeatable protocol others can run.
Bonus points if:
  • You have practical familiarity with publicbinding-datasources (ChEMBLBindingDB PubChemBioAssay) and the gotchas in each.
  • You have applied LLM tooling (Claude Codex Cursor) to accelerate data cleaning or metadata harmonization.
  • You have worked across institutional data boundaries federated multi-party or otherwise where the data-preparation contracthas toholdunder partial visibility.
  • You have a publication record or open-source contributions in cheminformatics or quantitative pharmacology.

What we offer you

  • Industry-competitive compensation including early-stage virtual share options
  • Remote-first work work where you work best
  • Wellbeing budget mental health support work-from-home budget co-working stipend and learning budget
  • Generous holiday allowance
  • Office Days at our Berlin HQ or a different European location (3x per year)
  • A high-calibre execution-focused team with experience from leading organizations
About ApherisAtApheris we are building the future of how AI is applied in pharmaceutical R&D.We enable leading pharmaceutical teams to discover and develop drugs faster. We host the industrys largest federated data networks for drug discovery AI spanning co-folding ADMET and antibody developability....

About Company

Company Logo

Build ML-powered products using data that spans organizational or geographical boundaries, while ensuring compliance with regulation.

View Profile View Profile