Using Electronic Health Records to Classify Cancer Site and Metastasis
dc.contributor.author | Kroenke, Kurt | |
dc.contributor.author | Ruddy, Kathryn J. | |
dc.contributor.author | Pachman, Deirdre R. | |
dc.contributor.author | Grzegorczyk, Veronica | |
dc.contributor.author | Herrin, Jeph | |
dc.contributor.author | Rahman, Parvez A. | |
dc.contributor.author | Tobin, Kyle A. | |
dc.contributor.author | Griffin, Joan M. | |
dc.contributor.author | Chlan, Linda L. | |
dc.contributor.author | Austin, Jessica D. | |
dc.contributor.author | Ridgeway, Jennifer L. | |
dc.contributor.author | Mitchell, Sandra A. | |
dc.contributor.author | Marsolo, Keith A. | |
dc.contributor.author | Cheville, Andrea L. | |
dc.contributor.department | Medicine, School of Medicine | |
dc.date.accessioned | 2025-07-15T13:49:53Z | |
dc.date.available | 2025-07-15T13:49:53Z | |
dc.date.issued | 2025 | |
dc.description.abstract | The Enhanced EHR-facilitated Cancer Symptom Control (E2C2) Trial is a pragmatic trial testing a collaborative care approach for managing common cancer symptoms. There were challenges in identifying cancer site and metastatic status. This study compares three different approaches to determine cancer site and six strategies for identifying the presence of metastasis using EHR and cancer registry data. The E2C2 cohort included 50,559 patients seen in the medical oncology clinics of a large health system. SPPADE symptoms were assessed with 0 to 10 numeric rating scales (NRS). A multistep process was used to develop three approaches for representing cancer site: the single most prevalent International Statistical Classification of Diseases and Related Health Problems, 10th Revision (ICD-10) code, the two most prevalent codes, and any diagnostic code. Six approaches for identifying metastatic disease were compared: ICD-10 codes, natural language processing (NLP), cancer registry, medications typically prescribed for incurable disease, treatment plan, and evaluation for phase 1 trials. The approach counting the two most prevalent ICD-10 cancer site diagnoses per patient detected a median of 92% of the cases identified by counting all cancer site diagnoses, whereas the approach counting only the single most prevalent cancer site diagnosis identified a median of 65%. However, agreement among the three approaches was very good (kappa > 0.80) for most cancer sites. ICD and NLP methods could be applied to the entire cohort and had the highest agreement (kappa = 0.53) for identifying metastasis. Cancer registry data was available for less than half of the patients. Identification of cancer site and metastatic disease using EHR data was feasible in this large and diverse cohort of patients with common cancer symptoms. The methods were pragmatic and may be acceptable for covariates, but likely require refinement for key dependent and independent variables. | |
dc.eprint.version | Final published version | |
dc.identifier.citation | Kroenke K, Ruddy KJ, Pachman DR, et al. Using Electronic Health Records to Classify Cancer Site and Metastasis. Appl Clin Inform. 2025;16(3):556-568. doi:10.1055/a-2544-3117 | |
dc.identifier.uri | https://hdl.handle.net/1805/49485 | |
dc.language.iso | en_US | |
dc.publisher | Thieme | |
dc.relation.isversionof | 10.1055/a-2544-3117 | |
dc.relation.journal | Applied Clinical Informatics | |
dc.rights | Attribution 4.0 International | en |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | |
dc.source | PMC | |
dc.subject | Neoplasms | |
dc.subject | Cancer site | |
dc.subject | Metastasis | |
dc.subject | Pragmatic clinical trial | |
dc.subject | Electronic health records | |
dc.subject | Natural language processing | |
dc.subject | Cancer registry | |
dc.title | Using Electronic Health Records to Classify Cancer Site and Metastasis | |
dc.type | Article |