Identifying clinical feature clusters toward predicting stroke in patients with asymptomatic carotid stenosis
Date
Language
Embargo Lift Date
Department
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Abstract
Despite the widespread application of machine learning models and feature selection methods to identify important clinical features in electronic health records (EHR) for disease prediction, the use of graph neural networks (GNNs) to uncover significant clinical features associated with a disease remains largely unexplored. In this investigation, we developed a computational method utilizing EHR data from Indiana University Medical Hospital to predict stroke in patients with asymptomatic carotid stenosis. We first constructed a patient clinical feature graph for each patient based on the co-occurrence of features (medications, diagnoses, and results of laboratory tests) in the EHR data within a predefined timeframe (e.g., 6 months before the detection of the disease). Then, we applied an unsupervised GNN-based clustering approach and our algorithm to select notable clinical feature clusters crucial for stroke prediction. These clinical features served as the basis for constructing patient representation for prediction. Various supervised learning models were evaluated for their prediction capabilities. Unlike conventional feature selection methods, our GNN-based feature selection approach relies solely on positive cases. We compared our method against baseline models for stroke prediction and achieved robust performance metrics, including an AUC of 0.87 and an F1 score of 0.80, surpassing all baselines. Additionally, we conducted an ablation study on the amount of EHR data, measured in months, to determine the most effective approach for generating patient clinical feature graphs. By capturing inherent relationships between clinical features using the graph model, our approach offers a promising avenue for advancing disease prediction, particularly in scenarios with limited positive cases available. Our code can be found on Github (https://github.com/xudav001/Identifying-Phenotype-Clusters).