Toward better public health reporting using existing off the shelf approaches: The value of medical dictionaries in automated cancer detection using plaintext medical data

Kasthurirathne, Suranga N.; Dixon, Brian E.; Gichoya, Judy; Xu, Huiping; Xia, Yuni; Mamlin, Burke; Grannis, Shaun J.

Toward better public health reporting using existing off the shelf approaches: The value of medical dictionaries in automated cancer detection using plaintext medical data

dc.contributor.author	Kasthurirathne, Suranga N.
dc.contributor.author	Dixon, Brian E.
dc.contributor.author	Gichoya, Judy
dc.contributor.author	Xu, Huiping
dc.contributor.author	Xia, Yuni
dc.contributor.author	Mamlin, Burke
dc.contributor.author	Grannis, Shaun J.
dc.contributor.department	Department of Epidemiology, Richard M. Fairbanks School of Public Health	en_US
dc.date.accessioned	2017-05-31T16:42:45Z
dc.date.available	2017-05-31T16:42:45Z
dc.date.issued	2017-05
dc.description.abstract	Objectives Existing approaches to derive decision models from plaintext clinical data frequently depend on medical dictionaries as the sources of potential features. Prior research suggests that decision models developed using non-dictionary based feature sourcing approaches and “off the shelf” tools could predict cancer with performance metrics between 80% and 90%. We sought to compare non-dictionary based models to models built using features derived from medical dictionaries. Materials and methods We evaluated the detection of cancer cases from free text pathology reports using decision models built with combinations of dictionary or non-dictionary based feature sourcing approaches, 4 feature subset sizes, and 5 classification algorithms. Each decision model was evaluated using the following performance metrics: sensitivity, specificity, accuracy, positive predictive value, and area under the receiver operating characteristics (ROC) curve. Results Decision models parameterized using dictionary and non-dictionary feature sourcing approaches produced performance metrics between 70 and 90%. The source of features and feature subset size had no impact on the performance of a decision model. Conclusion Our study suggests there is little value in leveraging medical dictionaries for extracting features for decision model building. Decision models built using features extracted from the plaintext reports themselves achieve comparable results to those built using medical dictionaries. Overall, this suggests that existing “off the shelf” approaches can be leveraged to perform accurate cancer detection using less complex Named Entity Recognition (NER) based feature extraction, automated feature selection and modeling approaches.	en_US
dc.eprint.version	Author's manuscript	en_US
dc.identifier.citation	Kasthurirathne, S. N., Dixon, B. E., Gichoya, J., Xu, H., Xia, Y., Mamlin, B., & Grannis, S. J. (2017). Toward better public health reporting using existing off the shelf approaches: The value of medical dictionaries in automated cancer detection using plaintext medical data. Journal of Biomedical Informatics. https://doi.org/10.1016/j.jbi.2017.04.008	en_US
dc.identifier.uri	https://hdl.handle.net/1805/12791
dc.language.iso	en	en_US
dc.publisher	Elsevier	en_US
dc.relation.isversionof	10.1016/j.jbi.2017.04.008	en_US
dc.relation.journal	Journal of Biomedical Informatics	en_US
dc.rights	Publisher Policy	en_US
dc.source	Author	en_US
dc.subject	public health reporting	en_US
dc.subject	medical dictionaries	en_US
dc.subject	decision models	en_US
dc.title	Toward better public health reporting using existing off the shelf approaches: The value of medical dictionaries in automated cancer detection using plaintext medical data	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Kasthurirathne_2017_toward.pdf
Size:: 1.08 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.88 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Open Access Policy Articles
Department of Computer Science Works
Department of Epidemiology Works