Blog

Blog

Blog

Towards a Common Patient Information Schema

Adam Blum

Nov 24, 2025

What a unified, patient-centric data model could look like for breast cancer trial matching

We talked in a recent post about centralized patient data repositories. Let’s dig deeper into what such a common repository might actually look like. CancerBot, as a clinical trial matching product, is deeply focused on the structured data required for accurate clinical trial eligibility. That scope gives us a concrete testbed for assessing whether a given data schema is sufficient for real-world matching.

Of course, a broader ecosystem will eventually need far more: patient-generated health data, imaging, genomics beyond trial eligibility, quality of life metrics, wearable feeds, and more. But for now, the breast cancer trial-matching use case gives us a clean, bounded first iteration. Future versions will extend naturally to follicular lymphoma, multiple myeloma, and other blood cancers supported by CancerBot.

With that in mind, here is what a patient information schema could look like for breast cancer clinical trial matching.

1. Core Demographics

The foundation of any patient schema begins with the universal identifiers and sociogeographic context that trials commonly use in eligibility logic:

  • Age, gender, ethnicity

  • Country, region, postal code

  • Latitude/longitude (for distance-to-trial-site calculations)

These may seem mundane, but they have real implications. Many trials restrict enrollment to specific countries or regions. Others score travel burden or require “reasonable” proximity to a clinical site. Even ethnicity can sometimes influence eligibility when germline risk or pharmacogenomic variants are relevant.

2. Physiologic Measurements

These are real, quantifiable values that describe the patient’s physiology — almost always required for baseline safety assessments.

  • Height, weight, BMI

  • Blood pressure, heart rate

  • Ejection fraction (cardiac function)

  • QTc interval (cardiac conduction risk)

  • Pulmonary function test summaries

These values ensure a patient can safely receive the investigational therapy. For example, QTc is crucial for certain targeted therapies; ejection fraction is mandatory for HER2-directed therapy trials.

3. Clinical Status

This category describes the cancer itself and the patient’s functional ability to undergo treatment.

  • Disease (breast cancer subtype or diagnosis code)

  • Stage (AJCC staging)

  • ECOG and Karnofsky performance status

This is where most trial logic starts. Nearly every breast cancer trial is written around stage (e.g., metastatic vs. early) and functional status (e.g., ECOG ≤ 1). A schema that doesn’t cleanly represent these will fail to match patients accurately.

4. Medical History

Trial protocols depend heavily on past and existing health conditions, because comorbidities directly affect safety.

  • Other active malignancies

  • Pre-existing conditions (cardiac, autoimmune, neurologic)

  • Neuropathy grade

  • HIV, hepatitis B/C status

  • Prior interstitial lung disease (ILD), prior pneumonitis, and ILD grade

  • Geographic and infectious exposure risks

Breast cancer trials increasingly exclude patients with prior ILD or pneumonitis due to the pulmonary toxicity risks of antibody–drug conjugates (ADCs). Similarly, viral infection status remains essential for immunotherapy safety.

5. Hematology, Renal, and Hepatic Diagnostics

The backbone of systemic therapy eligibility.

Hematology (CBC)

  • ANC

  • Platelets

  • WBC, RBC

  • Hemoglobin

Renal function

  • Creatinine clearance

  • Serum creatinine

  • eGFR

  • Derived renal adequacy status

Hepatic function

  • AST, ALT, ALP

  • Total and direct bilirubin

  • Albumin

Electrolytes & Minerals

  • Serum calcium

These values appear in nearly every inclusion/exclusion section. For trial matching, the schema must not only store the values but also the units — since a surprising amount of matching logic breaks when unit conversions are missing.

6. Treatment History

Breast cancer therapy is highly line-dependent; trial matching must know exactly what a patient has received and whether they responded.

Treatment Lines

  • First-line, second-line, later-line therapies

  • Dates

  • Outcomes (e.g., PR, PD, intolerant)

Supportive Care

Bisphosphonates, steroids, G-CSF — important but not counted as systemic lines.

Disease Course

  • Relapse count

  • Remission duration

Refractory Status

  • Endocrine-refractory

  • CDK4/6-refractory

  • ADC-refractory

Safety & Washout

  • Washout duration

  • Persisting toxicity grade (per CTCAE)

Prior Exposure Summary

  • Prior ADC exposure (a major eligibility gate for modern trials)

This is one of the most crucial areas. Without a structured, accurate treatment history, matching is guesswork.

7. Behavioral & Reproductive Safety Factors

Often overlooked, but central to trial safety and compliance.

Consent & Cognitive Status

  • Consent capability

  • Mental health considerations

  • Caregiver availability

Reproductive Safety

  • Pregnancy or lactation

  • Pregnancy test results

  • Contraception use

Substance Use

  • Tobacco

  • Alcohol or other substances

Exposure Risk

  • Geographic exposure (TB, fungal diseases, parasites)

  • Occupational exposure (healthcare, lab work, industrial toxins)

Many modern breast cancer trials introduce pulmonary, hepatic, or infection-risk exclusions that depend on these behavioral or environmental factors.

Putting It All Together

What emerges is a coherent, structured, interoperable patient information schema that supports breast cancer trial matching at a level of precision that older approaches simply cannot achieve.

A schema like this:

  • Captures the full clinical picture needed for safe and accurate matching

  • Maps cleanly to OMOP and FHIR

  • Is extensible to additional cancers

  • Can support broader patient-data repositories beyond trial matching

  • Enables true eligibility filtering — not guesswork or vignette-based ranking

About CancerBot

About CancerBot

About CancerBot

Turning frustration into innovation

After being diagnosed with follicular lymphoma, AI tech entrepreneur Adam Blum assumed he could easily find cutting-edge treatment options. Instead, he faced resistance from doctors and an exhausting search process. Determined to fix this, he built CancerBot—an AI-powered tool that makes clinical trials more accessible, helping patients find potential life-saving treatments faster.

Start your search for clinical trials now

New treatment options could be just a click away. Start a chat with CancerBot today and get matched with clinical trials tailored to you—quickly, easily, and at no cost.

Start your search for clinical trials now

New treatment options could be just a click away. Start a chat with CancerBot today and get matched with clinical trials tailored to you—quickly, easily, and at no cost.

Start your search for clinical trials now

New treatment options could be just a click away. Start a chat with CancerBot today and get matched with clinical trials tailored to you—quickly, easily, and at no cost.