Towards a Common Patient Information Schema
Adam Blum
Nov 24, 2025
What a unified, patient-centric data model could look like for breast cancer trial matching
We talked in a recent post about centralized patient data repositories. Let’s dig deeper into what such a common repository might actually look like. CancerBot, as a clinical trial matching product, is deeply focused on the structured data required for accurate clinical trial eligibility. That scope gives us a concrete testbed for assessing whether a given data schema is sufficient for real-world matching.
Of course, a broader ecosystem will eventually need far more: patient-generated health data, imaging, genomics beyond trial eligibility, quality of life metrics, wearable feeds, and more. But for now, the breast cancer trial-matching use case gives us a clean, bounded first iteration. Future versions will extend naturally to follicular lymphoma, multiple myeloma, and other blood cancers supported by CancerBot.
With that in mind, here is what a patient information schema could look like for breast cancer clinical trial matching.
1. Core Demographics
The foundation of any patient schema begins with the universal identifiers and sociogeographic context that trials commonly use in eligibility logic:
Age, gender, ethnicity
Country, region, postal code
Latitude/longitude (for distance-to-trial-site calculations)
These may seem mundane, but they have real implications. Many trials restrict enrollment to specific countries or regions. Others score travel burden or require “reasonable” proximity to a clinical site. Even ethnicity can sometimes influence eligibility when germline risk or pharmacogenomic variants are relevant.
2. Physiologic Measurements
These are real, quantifiable values that describe the patient’s physiology — almost always required for baseline safety assessments.
Height, weight, BMI
Blood pressure, heart rate
Ejection fraction (cardiac function)
QTc interval (cardiac conduction risk)
Pulmonary function test summaries
These values ensure a patient can safely receive the investigational therapy. For example, QTc is crucial for certain targeted therapies; ejection fraction is mandatory for HER2-directed therapy trials.
3. Clinical Status
This category describes the cancer itself and the patient’s functional ability to undergo treatment.
Disease (breast cancer subtype or diagnosis code)
Stage (AJCC staging)
ECOG and Karnofsky performance status
This is where most trial logic starts. Nearly every breast cancer trial is written around stage (e.g., metastatic vs. early) and functional status (e.g., ECOG ≤ 1). A schema that doesn’t cleanly represent these will fail to match patients accurately.
4. Medical History
Trial protocols depend heavily on past and existing health conditions, because comorbidities directly affect safety.
Other active malignancies
Pre-existing conditions (cardiac, autoimmune, neurologic)
Neuropathy grade
HIV, hepatitis B/C status
Prior interstitial lung disease (ILD), prior pneumonitis, and ILD grade
Geographic and infectious exposure risks
Breast cancer trials increasingly exclude patients with prior ILD or pneumonitis due to the pulmonary toxicity risks of antibody–drug conjugates (ADCs). Similarly, viral infection status remains essential for immunotherapy safety.
5. Hematology, Renal, and Hepatic Diagnostics
The backbone of systemic therapy eligibility.
Hematology (CBC)
ANC
Platelets
WBC, RBC
Hemoglobin
Renal function
Creatinine clearance
Serum creatinine
eGFR
Derived renal adequacy status
Hepatic function
AST, ALT, ALP
Total and direct bilirubin
Albumin
Electrolytes & Minerals
Serum calcium
These values appear in nearly every inclusion/exclusion section. For trial matching, the schema must not only store the values but also the units — since a surprising amount of matching logic breaks when unit conversions are missing.
6. Treatment History
Breast cancer therapy is highly line-dependent; trial matching must know exactly what a patient has received and whether they responded.
Treatment Lines
First-line, second-line, later-line therapies
Dates
Outcomes (e.g., PR, PD, intolerant)
Supportive Care
Bisphosphonates, steroids, G-CSF — important but not counted as systemic lines.
Disease Course
Relapse count
Remission duration
Refractory Status
Endocrine-refractory
CDK4/6-refractory
ADC-refractory
Safety & Washout
Washout duration
Persisting toxicity grade (per CTCAE)
Prior Exposure Summary
Prior ADC exposure (a major eligibility gate for modern trials)
This is one of the most crucial areas. Without a structured, accurate treatment history, matching is guesswork.
7. Behavioral & Reproductive Safety Factors
Often overlooked, but central to trial safety and compliance.
Consent & Cognitive Status
Consent capability
Mental health considerations
Caregiver availability
Reproductive Safety
Pregnancy or lactation
Pregnancy test results
Contraception use
Substance Use
Tobacco
Alcohol or other substances
Exposure Risk
Geographic exposure (TB, fungal diseases, parasites)
Occupational exposure (healthcare, lab work, industrial toxins)
Many modern breast cancer trials introduce pulmonary, hepatic, or infection-risk exclusions that depend on these behavioral or environmental factors.
Putting It All Together
What emerges is a coherent, structured, interoperable patient information schema that supports breast cancer trial matching at a level of precision that older approaches simply cannot achieve.
A schema like this:
Captures the full clinical picture needed for safe and accurate matching
Maps cleanly to OMOP and FHIR
Is extensible to additional cancers
Can support broader patient-data repositories beyond trial matching
Enables true eligibility filtering — not guesswork or vignette-based ranking
Turning frustration into innovation
After being diagnosed with follicular lymphoma, AI tech entrepreneur Adam Blum assumed he could easily find cutting-edge treatment options. Instead, he faced resistance from doctors and an exhausting search process. Determined to fix this, he built CancerBot—an AI-powered tool that makes clinical trials more accessible, helping patients find potential life-saving treatments faster.


