BEGIN: A Manifesto for Better Patient-Centered Precision Clinical Trial Matching
Adam Blum
Sep 24, 2025
I am heading out today to speak at Harvard’s DCI Network AI conference. I was asked to discuss my experience as a patient in evaluating trial matching services and how it informed what we are doing with CancerBot. This post lays out a few of my thoughts on this before the event.
I discussed my personal journey with cancer here a few months ago. I tried all commercial patient-accessible clinical trial matching services, and found that none of them do true precision matching to all of a trial’s eligibility criteria.
In addition, the existing open source AI-based trial matchers didn’t try to get precise matching, instead measuring themselves by how often the “doctor-labeled gold standard” for a “patient vignette” (a brief description of the patient and disease that does not have the full patient information) was within the top n (generally top 5) trials recommended by the matcher.
This is not true eligibility assessment and instead the ranking is an opaque mixture of eligibility and suitability (“goodness for the patient” as interpreted by a clinician with no patient input). We describe the problem with this way of measuring recommendation quality here, along with proposing better ways to measure true eligibility matching.
In face of this I felt called to build a better approach for true precision matching that is usable directly by patients. The overall system architecture we call EXtract Attributes from Clinical Trials (EXACT) and is described here. While EXACT is available as the free CancerBot service, it is open source and we are donating it to Harvard’s DCI Network and to other foundations that need a patient-centered matching service with true precision matching. There are several things CancerBot does that we would like to encourage other trial matchers to do to serve patients better:
Bot-based Onboarding
The first step is to get a fuller description of patient health than the patient vignettes so often bandied about when discussing trial matchers. Of course the best way to get the patient data is from their electronic health record (and CancerBot does this as we will discuss). But even in the scenario where the record is available, it is usually incomplete. The bot method helps to flesh out the missing information from the record. It can also elicit from the patient more attributes that help get them eligible for the most trials, possibly spurring more tests for them in the process.
Eligibility-Focused
Instead of finding vaguely “good” trials for the patient, first find the trials that are truly 100% eligible. To do this, all of the attributes necessary for inclusion or exclusion need to be extracted from the unstructured trial descriptions. It would be even better if researchers could define the eligibility criteria in structured form using approaches like MatchMiner’s CTML. But that approach seems to have been roundly ignored and trials are still posted on clinicaltrials.gov and other registries as unstructured text.
The advent of Large Language Models would seem to offer some hope in extracting true structured eligibility criteria from trials. Theoretically this could be achieved by presenting the LLM with the patient data and the full text of the trial description and asking if they matched. However the accuracy of this approach is lacking. Many eligibility criteria are missed. Synonyms for attributes are not always understood. Implied units for each attribute are misapplied. Instead the EXACT system performs per attribute extraction based on hand-tuned prompts. A “prompt workbench” allows each attribute’s prompt to be carefully constructed, with necessary alternate vocabulary, proper units and accurate minimum or maximum values.
To assess the accuracy of this method, a different approach is needed. Instead of evaluating all the way from the patient to the trial, just focus on the trial itself and whether the eligibility attributes have been successfully extracted. The accuracy of EXACT for extracting attributes from follicular lymphoma is analyzed here, with results overall (0.85 F1) and for each of the 533 attribute criteria for FL. This information is used by the biomedical subject matter experts tuning the prompts in the prompt workbench to focus on the problem attributes. This approach of focusing in on the attribute by attribute extraction with hand-crafted prompts and iteratively measuring success was used to extract true structured criteria for all follicular lymphoma, multiple myeloma and breast cancer trials from clinicaltrials.gov and other worldwide registries (ISRCTN, ICTRP, EUCTR and ANZCTR). We finally have a true structured database of all trials and their eligibility attributes.
Goodness for the Patient
Now that we have separated out eligibility assessment, we can evaluate “goodness” for the patient in a separate step with a robust analytical framework that is truly patient-centric. Specifically this means that each patient has their own way of deciding what is good for them. CancerBot uses risk, benefit, patient burden and distance in a multi-criteria decision analysis (MCDA) framework to score each trial for patient, based on the weight of each factor. Some patients value distance more than other factors, others might be more concerned with the trial protocol’s risk.
For the patient burden we use the Getz et al framework proposed here. For benefit we use the ESMO-MCBS rubric (more appropriate to cancer trials than the ASCO Value Framework). For risk and distance we present our own rubrics. The sub-scores for risk, benefit, burden and distance then get combined based on patient preferences. Full details of the scoring method are provided here. Other trial matchers may propose other scoring rubrics for risk, benefit, burden and distance (or perhaps just consider distance as part of burden). The important principle is to break down the factors driving goodness and enable patients to choose what is important for them.
Interoperability
As mentioned CancerBot can read patient records sent as FHIR records from EPIC MyChart (which reaches over 195 million patients). We are adding more FHIR sources. And of course most open source trial matchers rely on FHIR feeds now. But it is not enough to just support FHIR.
Trial matchers should be designed to serve patients long term, provide aggregation and analysis of the data, and feed any garnered additional information to other systems to maximize benefits for the patient. FHIR is primarily a wire format for communicating patient information and is not optimized for such warehousing and analysis needs. If we use a standard storage mechanism like OMOP then third party tools designed to work with OMOP can use it. One example is the OMOP Data Quality Dashboard which performs various validations on data. This is crucial because the various FHIR feeds from hospitals and other upstream providers may have inconsistencies and missing information that this can spot.
To build a matcher that performs well the OMOP normalized structure (a Person table joining to many different tables for observations, measurements, drugs and procedures) will be far too slow and unwieldy to build a performant search. Instead we also need a flattened denormalized table with the necessary attribute for each disease’s clinical trial matching. It turns out that machine learning prediction models also need the same flat structure.
At CancerBot we are leading the DCI Network patient data definition working group. One of the deliverables of this group is this “OMOP derived flat patient structure” and open source transformation tools to generate the flat schema. Our hope is that these efforts will drive other matchers to use OMOP and these open extensions to allow the trial matcher to work well in a larger open ecosystem.
Navigator-Oriented
Once a patient chooses a trial which they are both eligible for and they have decided is good for them, they can then express interest in the trial. This launches an interview attempting to answer questions and concerns about the trial, as a patient navigator would do. In fact the bot script was developed with the help of existing patient navigators and uses the persona of a patient navigator in its tone and style.
We want to fully inform the patient about risks and burdens and ensure that they truly understand what is involved, to reduce later dropouts which is inefficient for both the patient and the researcher. If the patient has repeated questions the bot can escalate to a human patient navigator. After the interview is finished, the content is stored and then provided to the patient navigator or trial researcher.
Conclusion
We built CancerBot because we didn’t see commercial or open source trial matchers truly determining eligibility accurately or properly evaluating goodness for each patient. My hope is that the principles listed here can help all clinical trial matching services improve and serve patients better. Specifically:
Bot interviews for completing patient health information
Eligibility assessment that is both complete and accurate
Goodness scoring based on risk, benefit, burden and distance
Interoperable with FHIR and OMOP with appropriate extensions
Navigator-oriented: interviews escalate to navigators and the content is driven by their insights
If you are a patient we hope you investigate clinical trials as part of evaluating your care options (often trials will not be mentioned or encouraged as was the case for me). These factors may help you choose a matching service. If you are a matching service implementor we suggest that following these principles can improve your service. Let’s all work together to find truly eligible trials for patients that are good for them based on their unique needs.
Turning frustration into innovation
After being diagnosed with follicular lymphoma, AI tech entrepreneur Adam Blum assumed he could easily find cutting-edge treatment options. Instead, he faced resistance from doctors and an exhausting search process. Determined to fix this, he built CancerBot—an AI-powered tool that makes clinical trials more accessible, helping patients find potential life-saving treatments faster.


