Inferring the patient’s age from implicit age clues in health forum posts

dc.contributor.authorBlack, Christopher M.
dc.contributor.authorMeng, Weilin
dc.contributor.authorYao, Lixia
dc.contributor.authorBen Miled, Zina
dc.contributor.departmentElectrical and Computer Engineering, School of Engineering and Technologyen_US
dc.date.accessioned2023-02-09T19:54:31Z
dc.date.available2023-02-09T19:54:31Z
dc.date.issued2022-01
dc.description.abstractBroader patient-reported experiences in oncology are largely unknown due to the lack of available information from traditional data sources. Online health community data provide an exploratory way to uncover these experiences at a large scale. Analyzing these data can guide further studies towards understanding patients’ needs and experiences. However, analysis of online health data is inherently difficult due to the unstructured nature of these data and the variety of ways information can be expressed over text. Specifically, subscribers may not disclose critical information such as the age of the patient in their posts. In fact, the number of health forum posts that explicitly mention the age of the patient is significantly lower than the number of posts that do not include this information in the Reddit r/Cancer health forum under consideration in the present paper. Health-focused studies often need to consider or control for age as a confounder, hence the importance of having sufficient age data. This paper presents a methodology that can help classify health forum posts according to four age groups (0–17, 18–39, 40–64 and 65 + years) even when the posts do not contain explicit mention of the age of the patient. First, the subset of the posts that include explicit mention of the age of the patient is identified. Second, the explicit age clues are removed from these posts and used to train the proposed age classifier. The resulting classifier is able to infer the age of the patient using only implicit age clues with an average true positive rate (TPR) of 71%. This TPR is comparable to the average TPR of 69% obtained from human annotations for the same set of posts.en_US
dc.eprint.versionFinal published versionen_US
dc.identifier.citationBlack, C. M., Meng, W., Yao, L., & Ben Miled, Z. (2022). Inferring the patient’s age from implicit age clues in health forum posts. Journal of Biomedical Informatics, 125, 103976. https://doi.org/10.1016/j.jbi.2021.103976en_US
dc.identifier.urihttps://hdl.handle.net/1805/31202
dc.language.isoenen_US
dc.publisherElsevieren_US
dc.relation.isversionof10.1016/j.jbi.2021.103976en_US
dc.relation.journalJournal of Biomedical Informaticsen_US
dc.rightsAttribution 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.sourcePublisheren_US
dc.subjectpatient ageen_US
dc.subjecthealth forumsen_US
dc.subjectclassificationen_US
dc.titleInferring the patient’s age from implicit age clues in health forum postsen_US
dc.typeArticleen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Black2022Inferring-CCBY.pdf
Size:
469.45 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: