A Vision for Centralized Patient Data Repositories
Adam Blum
Nov 3, 2025
Empowering patients, enabling clinicians, and unlocking the full potential of healthcare data
When Your Health Story Lives in Four Systems
When I was diagnosed with follicular lymphoma, my medical information proceeded to be scattered across four healthcare systems and three countries.
My initial labs were done in France, follow-up appointments with my private GP in Scotland, diagnostic work and treatment through NHS Scotland, and then a consultation request with MD Anderson Cancer Center in Houston.
Each institution held part of my story — but none had the complete picture.
When MD Anderson asked for my records, the NHS Subject Access Request could only produce DICOM images of CT and PET-CT scans. There was no structured electronic data for labs, diagnostics, or biomarkers. Despite everything being “digital,” the information couldn’t actually move.
Even if each provider had given me a data file, I still would have had to collate and interpret dozens of test results, medications, and reports to create a consistent record. Which version of my labs should be considered the truth? How could I ensure continuity of care across systems that don’t talk to each other?
That experience crystallized something for me:
We need a patient-controlled, centralized repository for health data — one that aggregates structured information from all sources, so both patients and clinicians can make informed, confident decisions.
The Missing Link: A Home for Patient Data
The healthcare industry has spent years working on data interoperability — and progress has been real. The FHIR standard (Fast Healthcare Interoperability Resources) has made it easier for systems to talk to each other.
But FHIR doesn’t define where data should live or how it should be assembled into a single, coherent patient record. It standardizes the pipes — not the container.
That’s why, despite decades of investment, patients still find themselves chasing PDFs, CDs, or portals that only hold fragments of their health history.
Regional Health Information Exchanges (HIEs) attempt to solve part of this problem, but they remain fragmented by geography and policy. There’s an opportunity now — especially with modern privacy controls and cloud architecture — to design something strategic, global, and patient-centric from the ground up.
At CancerBot, we’re deeply interested in consuming this kind of unified data repository (with patient authorization) for clinical trial matching. Having a trustworthy, structured record would make it vastly easier to match patients to precision medicine studies quickly and safely.
The Harvard-Radcliffe Initiative
Last week, I joined a Harvard Radcliffe Institute working seminar on building a long-term patient data repository — an effort led by Yuri Quintana, who assembled an extraordinary group of experts in clinical informatics, patient advocacy, and data standards.
Several patients shared their personal experiences. I hadn’t expected to tell mine, but Yuri invited me to speak about the challenges of assembling my fragmented records.
The most powerful story came from Betsy Lowe, a mother who maintains Excel spreadsheets to curate the medical histories of several of her children living with chronic illness. Betsy’s experience drove home how far we still have to go — and how urgently we need tools that make this easier.
Among the participants were:
Alexa McCray, creator of clinicaltrials.gov, the first clinical trials registry, decades ahead of its time
Dave deBronkart (“e-Patient Dave”), a pioneer in patient data rights
Cait Desroches, who for years has steered the OpenNotes initiative to the success it is today
Their presence underscored the goal: this must be a patient-first repository — something that gives individuals agency over their complete health record and enables clinicians to provide better, safer care.
What a Patient-Centered Repository Could Enable
Our workgroups identified two key principles:
1. Patient Control of Data Creation
Patients can grant write access to their providers.
They can collate and reconcile records from multiple sources.
They decide which version of overlapping data to treat as authoritative.
They can enrich their record with lifestyle, environment, and wearable data (diet, exercise, Apple Watch, Fitbit, etc.).
2. Patient Control of Data Usage
Patients choose who can see what — providers, caregivers, peers, or navigators.
They can delegate access rights to a trusted clinician or family member.
Why It Matters
Imagine a world where your entire medical record simply exists — securely updated by each provider through FHIR feeds, automatically organized, and available when you need it.
For Patients
No more managing binders or chasing records.
A complete view of your history, medications, and results.
Control over what’s shared and with whom.
The ability to opt in to share anonymized data for research — even earning royalties when your data contributes to discoveries.
For Clinicians
More complete data at the point of care.
Better diagnostic accuracy and continuity.
Less administrative overhead.
Stronger patient relationships built on transparency.
For Researchers and AI Developers
High-quality, structured data that accelerates discovery.
Ability to train models responsibly using consented, representative datasets.
When data is complete, AI assistants and navigators can meaningfully support patients and clinicians alike — from interpreting lab results to identifying trial matches or care pathways.
Barriers We Must Overcome
The hardest problems are not technical — they’re institutional and cultural. We identified several major barriers:
Provider trust: Clinicians hesitate to use data from other institutions (or patients) for liability reasons.
Potential fix: regulations that recognize verified, patient-controlled repositories as trusted sources.
2. Data ownership: Some institutions resist sharing data for fear of losing patients.
Response: patient-centered systems can strengthen continuity, not weaken it.
3. Billing and incentives: Providers may prefer repeating diagnostics they can bill for.
Solution: policy incentives for data reuse and interoperability compliance.
4. Error correction: Institutions are reluctant to correct legacy data.
Solution: feedback mechanisms that log corrections transparently and return them to source systems.
Our collective answer to skepticism that “patients aren’t competent to manage their own records” was simple:
“If not the patient — who?”
Defining Success: SMART Goals
We converted our vision into specific, measurable goals. Among them:
Patient signups: 13 million within five years of launch (about 5% of U.S. adults — a proven tipping point for adoption).
Engagement: Each user logs at least one session per year within two years.
Referrals: 5% of users refer peers within two years.
Satisfaction: Two-thirds of users rate the service as valuable.
Institutional participation: 20% of FHIR-adopting providers push data within five years.
Carer enablement: 5% of users have linked caregiver accounts.
These metrics give the project both vision and accountability — clear milestones to measure progress against.
Speaking to Stakeholders
Different audiences have different motivations, and we defined simple, focused messages for each:
Motivated Patient: “Optimize your health with the full picture of your data. Choose what’s shared, to whom, and for what.”
Caregiver: “Helping someone manage their health is hard enough. We simplify the information part.”
Clinician: “Caring for the whole patient just got easier.”
Hospital: “More complete data, better outcomes.”
Funder (Tech/Data): “Accelerate an ecosystem built on high-quality, consented patient data.”
Funder (Social Good): “Empower patients and families to take charge of their health.”
Milestones Toward Delivery
We proposed a phased roadmap — each milestone is immediately useful and adds tangible value over the previous one:
FHIR-Importing Mobile App:
A smartphone app that aggregates data from providers via FHIR push feeds. Data can live only on the device if patients prefer.Web-Based Portal:
Adds a hosted version for patients comfortable with cloud access.Patient-Provided Data:
Support for device data, PDFs, and manual uploads.AI Assistant Integration:
An LLM-powered assistant helps patients structure, reconcile, and correct data.Bidirectional Data Exchange:
Patients can send updates or corrections back to providers (optional for acceptance).Anonymized Data Sharing:
Patients can opt to share de-identified data for research — and receive royalties when used by for-profit entities, funding sustainability for the non-profit repository.
Building It: Open and Collaborative
Such a repository should be built and governed by a trusted non-profit, ideally within the Harvard DCI Network.
At CancerBot, we’ve already begun developing a foundation for this through our open-source project CTOMOP (Clinical Trials OMOP) — available on GitHub.
CTOMOP extends the OMOP schema to include richer genomic data (such as assay information) and provides a flattened table structure optimized for clinical trial matching and predictive modeling. It’s fully open source and designed for community collaboration through the DCI Network.
While our initial use case is trial matching, the schema supports broader applications: building predictive models for standard-of-care outcomes, powering AI research, and supporting long-term population health studies.
The Road Ahead
A unified, patient-controlled data repository isn’t just a technical challenge — it’s a moral and practical imperative. Every day, patients lose time, clarity, and even treatment opportunities because their data lives in silos.
We now have the standards, the technology, and the collective will to fix that. What we need next is collaboration — across patients, clinicians, technologists, and policymakers — to make it real.
The future of healthcare data belongs to those who share it responsibly.
Let’s build the infrastructure that makes that possible.
Turning frustration into innovation
After being diagnosed with follicular lymphoma, AI tech entrepreneur Adam Blum assumed he could easily find cutting-edge treatment options. Instead, he faced resistance from doctors and an exhausting search process. Determined to fix this, he built CancerBot—an AI-powered tool that makes clinical trials more accessible, helping patients find potential life-saving treatments faster.


