Identification of colorectal cancer using structured and free text clinical data

dc.contributor.authorRedd, Douglas F.
dc.contributor.authorShao, Yijun
dc.contributor.authorZeng-Treitler, Qing
dc.contributor.authorMyers, Laura J.
dc.contributor.authorBarker, Barry C.
dc.contributor.authorNelson, Stuart J.
dc.contributor.authorImperiale, Thomas F.
dc.contributor.departmentMedicine, School of Medicine
dc.date.accessioned2024-08-19T11:10:21Z
dc.date.available2024-08-19T11:10:21Z
dc.date.issued2022
dc.description.abstractColorectal cancer incidence has continually fallen among those 50 years old and over. However, the incidence has increased in those under 50. Even with the recent screening guidelines recommending that screening begins at age 45, nearly half of all early-onset colorectal cancer will be missed. Methods are needed to identify high-risk individuals in this age group for targeted screening. Colorectal cancer studies, as with other clinical studies, have required labor intensive chart review for the identification of those affected and risk factors. Natural language processing and machine learning can be used to automate the process and enable the screening of large numbers of patients. This study developed and compared four machine learning and statistical models: logistic regression, support vector machine, random forest, and deep neural network, in their performance in classifying colorectal cancer patients. Excellent classification performance is achieved with AUCs over 97%.
dc.eprint.versionFinal published version
dc.identifier.citationRedd DF, Shao Y, Zeng-Treitler Q, et al. Identification of colorectal cancer using structured and free text clinical data. Health Informatics J. 2022;28(4):14604582221134406. doi:10.1177/14604582221134406
dc.identifier.urihttps://hdl.handle.net/1805/42833
dc.language.isoen_US
dc.publisherSage
dc.relation.isversionof10.1177/14604582221134406
dc.relation.journalHealth Informatics Journal
dc.rightsAttribution-NonCommercial 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.sourcePublisher
dc.subjectColon cancer
dc.subjectFeature utilization
dc.subjectMachine learning
dc.subjectModel comparison
dc.subjectStatistical models
dc.titleIdentification of colorectal cancer using structured and free text clinical data
dc.typeArticle
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Redd2022Identification-CCBYNC.pdf
Size:
536.02 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.04 KB
Format:
Item-specific license agreed upon to submission
Description: