Using Electronic Health Records to Classify Cancer Site and Metastasis

dc.contributor.authorKroenke, Kurt
dc.contributor.authorRuddy, Kathryn J.
dc.contributor.authorPachman, Deirdre R.
dc.contributor.authorGrzegorczyk, Veronica
dc.contributor.authorHerrin, Jeph
dc.contributor.authorRahman, Parvez A.
dc.contributor.authorTobin, Kyle A.
dc.contributor.authorGriffin, Joan M.
dc.contributor.authorChlan, Linda L.
dc.contributor.authorAustin, Jessica D.
dc.contributor.authorRidgeway, Jennifer L.
dc.contributor.authorMitchell, Sandra A.
dc.contributor.authorMarsolo, Keith A.
dc.contributor.authorCheville, Andrea L.
dc.contributor.departmentMedicine, School of Medicine
dc.date.accessioned2025-07-15T13:49:53Z
dc.date.available2025-07-15T13:49:53Z
dc.date.issued2025
dc.description.abstractThe Enhanced EHR-facilitated Cancer Symptom Control (E2C2) Trial is a pragmatic trial testing a collaborative care approach for managing common cancer symptoms. There were challenges in identifying cancer site and metastatic status. This study compares three different approaches to determine cancer site and six strategies for identifying the presence of metastasis using EHR and cancer registry data. The E2C2 cohort included 50,559 patients seen in the medical oncology clinics of a large health system. SPPADE symptoms were assessed with 0 to 10 numeric rating scales (NRS). A multistep process was used to develop three approaches for representing cancer site: the single most prevalent International Statistical Classification of Diseases and Related Health Problems, 10th Revision (ICD-10) code, the two most prevalent codes, and any diagnostic code. Six approaches for identifying metastatic disease were compared: ICD-10 codes, natural language processing (NLP), cancer registry, medications typically prescribed for incurable disease, treatment plan, and evaluation for phase 1 trials. The approach counting the two most prevalent ICD-10 cancer site diagnoses per patient detected a median of 92% of the cases identified by counting all cancer site diagnoses, whereas the approach counting only the single most prevalent cancer site diagnosis identified a median of 65%. However, agreement among the three approaches was very good (kappa > 0.80) for most cancer sites. ICD and NLP methods could be applied to the entire cohort and had the highest agreement (kappa = 0.53) for identifying metastasis. Cancer registry data was available for less than half of the patients. Identification of cancer site and metastatic disease using EHR data was feasible in this large and diverse cohort of patients with common cancer symptoms. The methods were pragmatic and may be acceptable for covariates, but likely require refinement for key dependent and independent variables.
dc.eprint.versionFinal published version
dc.identifier.citationKroenke K, Ruddy KJ, Pachman DR, et al. Using Electronic Health Records to Classify Cancer Site and Metastasis. Appl Clin Inform. 2025;16(3):556-568. doi:10.1055/a-2544-3117
dc.identifier.urihttps://hdl.handle.net/1805/49485
dc.language.isoen_US
dc.publisherThieme
dc.relation.isversionof10.1055/a-2544-3117
dc.relation.journalApplied Clinical Informatics
dc.rightsAttribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.sourcePMC
dc.subjectNeoplasms
dc.subjectCancer site
dc.subjectMetastasis
dc.subjectPragmatic clinical trial
dc.subjectElectronic health records
dc.subjectNatural language processing
dc.subjectCancer registry
dc.titleUsing Electronic Health Records to Classify Cancer Site and Metastasis
dc.typeArticle
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kroenke2025Using-CCBY.pdf
Size:
4.66 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.04 KB
Format:
Item-specific license agreed upon to submission
Description: