- Browse by Author
Browsing by Author "Luo, Xiao"
Now showing 1 - 10 of 51
Results Per Page
Sort Options
Item A Deep Language Model for Symptom Extraction From Clinical Text and its Application to Extract COVID-19 Symptoms From Social Media(IEEE, 2022) Luo, Xiao; Gandhi, Priyanka; Storey, Susan; Huang, Kun; Biostatistics and Health Data Science, School of MedicinePatients experience various symptoms when they have either acute or chronic diseases or undergo some treatments for diseases. Symptoms are often indicators of the severity of the disease and the need for hospitalization. Symptoms are often described in free text written as clinical notes in the Electronic Health Records (EHR) and are not integrated with other clinical factors for disease prediction and healthcare outcome management. In this research, we propose a novel deep language model to extract patient-reported symptoms from clinical text. The deep language model integrates syntactic and semantic analysis for symptom extraction and identifies the actual symptoms reported by patients and conditional or negation symptoms. The deep language model can extract both complex and straightforward symptom expressions. We used a real-world clinical notes dataset to evaluate our model and demonstrated that our model achieves superior performance compared to three other state-of-the-art symptom extraction models. We extensively analyzed our model to illustrate its effectiveness by examining each component’s contribution to the model. Finally, we applied our model on a COVID-19 tweets data set to extract COVID-19 symptoms. The results show that our model can identify all the symptoms suggested by CDC ahead of their timeline and many rare symptoms.Item Analyzing the Correlations between the Uninsured and Diabetes Prevalence Rates in Geographic Regions in the United States(IEEE, 2017-07) Luo, Xiao; Computer Information and Graphics Technology, School of Engineering and TechnologyThe increasing prevalence of diagnosed diabetes has drawn attentions of researchers in recently years. Research has been done in finding the correlations between diabetes prevalence with socioeconomic factors, obesity, social behaviors and so on. Since 2010, diabetes preventive services have been covered under health insurance plans in order to reduce diabetes burden and control the increasing of diabetes prevalence. In this study, a hierarchical clustering model is proposed by using Expectation-Maximization algorithm to investigate the correlations between the uninsured and diabetes prevalence rates in 3142 counties in United States for years from 2009 to 2013. The results identified geographic disparities in the uninsured and diabetes prevalence rates of individual years and over consecutive years.Item Analyzing the symptoms in colorectal and breast cancer patients with or without type 2 diabetes using EHR data(Sage, 2021) Luo, Xiao; Storey, Susan; Gandhi, Priyanka; Zhang, Zuoyi; Metzger, Megan; Huang, Kun; Computer Information and Graphics Technology, School of Engineering and TechnologyThis research extracted patient-reported symptoms from free-text EHR notes of colorectal and breast cancer patients and studied the correlation of the symptoms with comorbid type 2 diabetes, race, and smoking status. An NLP framework was developed first to use UMLS MetaMap to extract all symptom terms from the 366,398 EHR clinical notes of 1694 colorectal cancer (CRC) patients and 3458 breast cancer (BC) patients. Semantic analysis and clustering algorithms were then developed to categorize all the relevant symptoms into eight symptom clusters defined by seed terms. After all the relevant symptoms were extracted from the EHR clinical notes, the frequency of the symptoms reported from colorectal cancer (CRC) and breast cancer (BC) patients over three time-periods post-chemotherapy was calculated. Logistic regression (LR) was performed with each symptom cluster as the response variable while controlling for diabetes, race, and smoking status. The results show that the CRC and BC patients with Type 2 Diabetes (T2D) were more likely to report symptoms than CRC and BC without T2D over three time-periods in the cancer trajectory. We also found that current smokers were more likely to report anxiety (CRC, BC), neuropathic symptoms (CRC, BC), anxiety (BC), and depression (BC) than non-smokers.Item Annotation and Information Extraction of Consumer-Friendly Health Articles for Enhancing Laboratory Test Reporting(American Medical Informatics Association, 2024-01-11) He, Zhe; Tian, Shubo; Erdengasileng, Arslan; Hanna, Karim; Gong, Yang; Zhang, Zhan; Luo, Xiao; Lustria, Mia Liza A.; Engineering Technology, Purdue School of Engineering and TechnologyViewing laboratory test results is patients' most frequent activity when accessing patient portals, but lab results can be very confusing for patients. Previous research has explored various ways to present lab results, but few have attempted to provide tailored information support based on individual patient's medical context. In this study, we collected and annotated interpretations of textual lab result in 251 health articles about laboratory tests from AHealthyMe.com. Then we evaluated transformer-based language models including BioBERT, ClinicalBERT, RoBERTa, and PubMedBERT for recognizing key terms and their types. Using BioPortal's term search API, we mapped the annotated terms to concepts in major controlled terminologies. Results showed that PubMedBERT achieved the best F1 on both strict and lenient matching criteria. SNOMED CT had the best coverage of the terms, followed by LOINC and ICD-10-CM. This work lays the foundation for enhancing the presentation of lab results in patient portals by providing patients with contextualized interpretations of their lab results and individualized question prompts that they can, in turn, refer to during physician consults.Item Application of unsupervised deep learning algorithms for identification of specific clusters of chronic cough patients from EMR data(BMC, 2022-04-19) Shao, Wei; Luo, Xiao; Zhang, Zuoyi; Han, Zhi; Chandrasekaran, Vasu; Turzhitsky, Vladimir; Bali, Vishal; Roberts, Anna R.; Metzger, Megan; Baker, Jarod; La Rosa, Carmen; Weaver, Jessica; Dexter, Paul; Huang, Kun; Biostatistics and Health Data Science, School of MedicineBackground: Chronic cough affects approximately 10% of adults. The lack of ICD codes for chronic cough makes it challenging to apply supervised learning methods to predict the characteristics of chronic cough patients, thereby requiring the identification of chronic cough patients by other mechanisms. We developed a deep clustering algorithm with auto-encoder embedding (DCAE) to identify clusters of chronic cough patients based on data from a large cohort of 264,146 patients from the Electronic Medical Records (EMR) system. We constructed features using the diagnosis within the EMR, then built a clustering-oriented loss function directly on embedded features of the deep autoencoder to jointly perform feature refinement and cluster assignment. Lastly, we performed statistical analysis on the identified clusters to characterize the chronic cough patients compared to the non-chronic cough patients. Results: The experimental results show that the DCAE model generated three chronic cough clusters and one non-chronic cough patient cluster. We found various diagnoses, medications, and lab tests highly associated with chronic cough patients by comparing the chronic cough cluster with the non-chronic cough cluster. Comparison of chronic cough clusters demonstrated that certain combinations of medications and diagnoses characterize some chronic cough clusters. Conclusions: To the best of our knowledge, this study is the first to test the potential of unsupervised deep learning methods for chronic cough investigation, which also shows a great advantage over existing algorithms for patient data clustering.Item Are Recent Terrorism Trends Reflected in Social Media?(IEEE, 2017-10) Terziyska, Ivana; Shah, Setu; Luo, Xiao; Engineering Technology, School of Engineering and TechnologySocial media plays an important role in shaping the beliefs and sentiments of an audience regarding an event. A comparison between public data sets that have holistic features and social media data set that include more user features would give insight into the spread of misinformation and aspects of events that are reflected in user behavior. In this research, we compare the trends identified in the public data set - Global Terrorism Database (GTD) with the trends reflected through the social media data obtained using the Twitter API. The unsupervised learning algorithm Self-Organizing Map (SOM) is used to identify the features and trends summarized by the clusters. The results show discrepancies in the features and related trends of terrorism events in the GTD data set and obtained Twitter data set to suggest some media bias and public perception on terrorism.Item Attention Mechanism with BERT for Content Annotation and Categorization of Pregnancy-Related Questions on a Community Q&A Site(IEEE, 2020-12) Luo, Xiao; Ding, Haoran; Tang, Matthew; Gandhi, Priyanka; Zhang, Zhan; He, Zhe; Engineering Technology, School of Engineering and TechnologyIn recent years, the social web has been increasingly used for health information seeking, sharing, and subsequent health-related research. Women often use the Internet or social networking sites to seek information related to pregnancy in different stages. They may ask questions about birth control, trying to conceive, labor, or taking care of a newborn or baby. Classifying different types of questions about pregnancy information (e.g., before, during, and after pregnancy) can inform the design of social media and professional websites for pregnancy education and support. This research aims to investigate the attention mechanism built-in or added on top of the BERT model in classifying and annotating the pregnancy-related questions posted on a community Q&A site. We evaluated two BERT-based models and compared them against the traditional machine learning models for question classification. Most importantly, we investigated two attention mechanisms: the built-in self-attention mechanism of BERT and the additional attention layer on top of BERT for relevant term annotation. The classification performance showed that the BERT-based models worked better than the traditional models, and BERT with an additional attention layer can achieve higher overall precision than the basic BERT model. The results also showed that both attention mechanisms work differently on annotating relevant content, and they could serve as feature selection methods for text mining in general.Item Biomedical concept association and clustering using word embeddings(2018-12) Shah, Setu; Luo, Xiao; El-Sharkawy, Mohamed; King, BrianBiomedical data exists in the form of journal articles, research studies, electronic health records, care guidelines, etc. While text mining and natural language processing tools have been widely employed across various domains, these are just taking off in the healthcare space. A primary hurdle that makes it difficult to build artificial intelligence models that use biomedical data, is the limited amount of labelled data available. Since most models rely on supervised or semi-supervised methods, generating large amounts of pre-processed labelled data that can be used for training purposes becomes extremely costly. Even for datasets that are labelled, the lack of normalization of biomedical concepts further affects the quality of results produced and limits the application to a restricted dataset. This affects reproducibility of the results and techniques across datasets, making it difficult to deploy research solutions to improve healthcare services. The research presented in this thesis focuses on reducing the need to create labels for biomedical text mining by using unsupervised recurrent neural networks. The proposed method utilizes word embeddings to generate vector representations of biomedical concepts based on semantics and context. Experiments with unsupervised clustering of these biomedical concepts show that concepts that are similar to each other are clustered together. While this clustering captures different synonyms of the same concept, it also captures the similarities between various diseases and the symptoms that those diseases are symptomatic of. To test the performance of the concept vectors on corpora of documents, a document vector generation method that utilizes these concept vectors is also proposed. The document vectors thus generated are used as an input to clustering algorithms, and the results show that across multiple corpora, the proposed methods of concept and document vector generation outperform the baselines and provide more meaningful clustering. The applications of this document clustering are huge, especially in the search and retrieval space, providing clinicians, researchers and patients more holistic and comprehensive results than relying on the exclusive term that they search for. At the end, a framework for extracting clinical information that can be mapped to electronic health records from preventive care guidelines is presented. The extracted information can be integrated with the clinical decision support system of an electronic health record. A visualization tool to better understand and observe patient trajectories is also explored. Both these methods have potential to improve the preventive care services provided to patients.Item Biostatistics and Health Data Science, School of Medicine(JMIR, 2021-11-25) Zhang, Zhan; Kmoth, Lukas; Luo, Xiao; He, Zhe; Biostatistics and Health Data Science, Richard M. Fairbanks School of Public HealthBackground: Personal clinical data, such as laboratory test results, are increasingly being made available to patients via patient portals. However, laboratory test results are presented in a way that is difficult for patients to interpret and use. Furthermore, the indications of laboratory test results may vary among patients with different characteristics and from different medical contexts. To date, little is known about how to design patient-centered technology to facilitate the interpretation of laboratory test results. Objective: The aim of this study is to explore design considerations for supporting patient-centered communication and comprehension of laboratory test results, as well as discussions between patients and health care providers. Methods: We conducted a user-centered, multicomponent design research consisting of user studies, an iterative prototype design, and pilot user evaluations, to explore design concepts and considerations that are useful for supporting patients in not only viewing but also interpreting and acting upon laboratory test results. Results: The user study results informed the iterative design of a system prototype, which had several interactive features: using graphical representations and clear takeaway messages to convey the concerning nature of the results; enabling users to annotate laboratory test reports; clarifying medical jargon using nontechnical verbiage and allowing users to interact with the medical terms (eg, saving, favoriting, or sorting); and providing pertinent and reliable information to help patients comprehend test results within their medical context. The results of a pilot user evaluation with 8 patients showed that the new patient-facing system was perceived as useful in not only presenting laboratory test results to patients in a meaningful way but also facilitating in situ patient-provider interactions. Conclusions: We draw on our findings to discuss design implications for supporting patient-centered communication of laboratory test results and how to make technology support informative, trustworthy, and empathetic.Item Community Studies of Antisemitism in Schools (CSAIS) Community Typology Explorer(2021) Price, Jeremy F.; Wilson, Jeffrey S.; Schall, Carly E.; Snorten, Clifton L.; Hasan, Mohammad A.; Luo, Xiao; Jahin, S. M. AbrarThis is a companion document to the CSAIS (Community Studies of Antisemitism In Schools) Community Typology Explorer which can be found at https://jeremyfprice.github.io/csais-dashboard/. Details about specific incidents, communities, and community types can be found at the CSAIS Community Typology Explorer. This project utilizes data from the ADL H.E.A.T. Map between 2016 and 2019 to identify incidents of antisemitism that specifically took place in schools. These incidents in schools are influenced by demographic, historical, social, and political factors. This project brings this data together to construct a community typology at the national level. This typology will provide insight into the ways that school-based incidents of hate are enacted and reported in context. Developing a community typology will allow providers to better target specific demographic, historical, and political attributes of the communities in which these incidents occur through curriculum and learning experiences.