- Browse by Author
Browsing by Author "Roberts, Anna R."
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Application of unsupervised deep learning algorithms for identification of specific clusters of chronic cough patients from EMR data(BMC, 2022-04-19) Shao, Wei; Luo, Xiao; Zhang, Zuoyi; Han, Zhi; Chandrasekaran, Vasu; Turzhitsky, Vladimir; Bali, Vishal; Roberts, Anna R.; Metzger, Megan; Baker, Jarod; La Rosa, Carmen; Weaver, Jessica; Dexter, Paul; Huang, Kun; Biostatistics and Health Data Science, School of MedicineBackground: Chronic cough affects approximately 10% of adults. The lack of ICD codes for chronic cough makes it challenging to apply supervised learning methods to predict the characteristics of chronic cough patients, thereby requiring the identification of chronic cough patients by other mechanisms. We developed a deep clustering algorithm with auto-encoder embedding (DCAE) to identify clusters of chronic cough patients based on data from a large cohort of 264,146 patients from the Electronic Medical Records (EMR) system. We constructed features using the diagnosis within the EMR, then built a clustering-oriented loss function directly on embedded features of the deep autoencoder to jointly perform feature refinement and cluster assignment. Lastly, we performed statistical analysis on the identified clusters to characterize the chronic cough patients compared to the non-chronic cough patients. Results: The experimental results show that the DCAE model generated three chronic cough clusters and one non-chronic cough patient cluster. We found various diagnoses, medications, and lab tests highly associated with chronic cough patients by comparing the chronic cough cluster with the non-chronic cough cluster. Comparison of chronic cough clusters demonstrated that certain combinations of medications and diagnoses characterize some chronic cough clusters. Conclusions: To the best of our knowledge, this study is the first to test the potential of unsupervised deep learning methods for chronic cough investigation, which also shows a great advantage over existing algorithms for patient data clustering.Item An Automated Line-of-Therapy Algorithm for Adults With Metastatic Non-Small Cell Lung Cancer: Validation Study Using Blinded Manual Chart Review(JMIR Publications, 2021-10-12) Meng, Weilin; Mosesso, Kelly M.; Lane, Kathleen A.; Roberts, Anna R.; Griffith, Ashley; Ou, Wanmei; Dexter, Paul R.; Biostatistics & Health Data Science, School of MedicineBackground: Extraction of line-of-therapy (LOT) information from electronic health record and claims data is essential for determining longitudinal changes in systemic anticancer therapy in real-world clinical settings. Objective: The aim of this retrospective cohort analysis is to validate and refine our previously described open-source LOT algorithm by comparing the output of the algorithm with results obtained through blinded manual chart review. Methods: We used structured electronic health record data and clinical documents to identify 500 adult patients treated for metastatic non-small cell lung cancer with systemic anticancer therapy from 2011 to mid-2018; we assigned patients to training (n=350) and test (n=150) cohorts, randomly divided proportional to the overall ratio of simple:complex cases (n=254:246). Simple cases were patients who received one LOT and no maintenance therapy; complex cases were patients who received more than one LOT and/or maintenance therapy. Algorithmic changes were performed using the training cohort data, after which the refined algorithm was evaluated against the test cohort. Results: For simple cases, 16 instances of discordance between the LOT algorithm and chart review prerefinement were reduced to 8 instances postrefinement; in the test cohort, there was no discordance between algorithm and chart review. For complex cases, algorithm refinement reduced the discordance from 68 to 62 instances, with 37 instances in the test cohort. The percentage agreement between LOT algorithm output and chart review for patients who received one LOT was 89% prerefinement, 93% postrefinement, and 93% for the test cohort, whereas the likelihood of precise matching between algorithm output and chart review decreased with an increasing number of unique regimens. Several areas of discordance that arose from differing definitions of LOTs and maintenance therapy could not be objectively resolved because of a lack of precise definitions in the medical literature. Conclusions: Our findings identify common sources of discordance between the LOT algorithm and clinician documentation, providing the possibility of targeted algorithm refinement.Item Identifying and Characterizing a Chronic Cough Cohort Through Electronic Health Records(Elsevier, 2021-06) Weiner, Michael; Dexter, Paul R.; Heithoff, Kim; Roberts, Anna R.; Liu, Ziyue; Griffith, Ashley; Hui, Siu; Schelfhout, Jonathan; Dicpinigaitis, Peter; Doshi, Ishita; Weaver, Jessica P.; Medicine, School of MedicineBackground Chronic cough (CC) of 8 weeks or more affects about 10% of adults and may lead to expensive treatments and reduced quality of life. Incomplete diagnostic coding complicates identifying CC in electronic health records (EHRs). Natural language processing (NLP) of EHR text could improve detection. Research Question Can NLP be used to identify cough in EHRs, and to characterize adults and encounters with CC? Study Design and Methods A Midwestern EHR system identified patients aged 18 to 85 years during 2005 to 2015. NLP was used to evaluate text notes, except prescriptions and instructions, for mentions of cough. Two physicians and a biostatistician reviewed 12 sets of 50 encounters each, with iterative refinements, until the positive predictive value for cough encounters exceeded 90%. NLP, International Classification of Diseases, 10th revision, or medication was used to identify cough. Three encounters spanning 56 to 120 days defined CC. Descriptive statistics summarized patients and encounters, including referrals. Results Optimizing NLP required identifying and eliminating cough denials, instructions, and historical references. Of 235,457 cough encounters, 23% had a relevant diagnostic code or medication. Applying chronicity to cough encounters identified 23,371 patients (61% women) with CC. NLP alone identified 74% of these patients; diagnoses or medications alone identified 15%. The positive predictive value of NLP in the reviewed sample was 97%. Referrals for cough occurred for 3.0% of patients; pulmonary medicine was most common initially (64% of referrals). Limitations Some patients with diagnosis codes for cough, encounters at intervals greater than 4 months, or multiple acute cough episodes may have been misclassified. Interpretation NLP successfully identified a large cohort with CC. Most patients were identified through NLP alone, rather than diagnoses or medications. NLP improved detection of patients nearly sevenfold, addressing the gap in ability to identify and characterize CC disease burden. Nearly all cases appeared to be managed in primary care. Identifying these patients is important for characterizing treatment and unmet needs.Item Kidney Histopathology and Prediction of Kidney Failure: A Retrospective Cohort Study(Elsevier, 2020-09) Eadon, Michael T.; Schwantes-An, Tae-Hwi; Phillips, Carrie L.; Roberts, Anna R.; Greene, Colin V.; Hallab, Ayman; Hart, Kyle J.; Lipp, Sarah N.; Perez-Ledezma, Claudio; Omar, Khawaja O.; Kelly, Katherine J.; Moe, Sharon M.; Dagher, Pierre C.; El-Achkar, Tarek M.; Moorthi, Ranjani N.; Medical and Molecular Genetics, School of MedicineRationale & objective: The use of kidney histopathology for predicting kidney failure is not established. We hypothesized that the use of histopathologic features of kidney biopsy specimens would improve prediction of clinical outcomes made using demographic and clinical variables alone. Study design: Retrospective cohort study and development of a clinical prediction model. Setting & participants: All 2,720 individuals from the Biopsy Biobank Cohort of Indiana who underwent kidney biopsy between 2002 and 2015 and had at least 2 years of follow-up. New predictors & established predictors: Demographic variables, comorbid conditions, baseline clinical characteristics, and histopathologic features. Outcomes: Time to kidney failure, defined as sustained estimated glomerular filtration rate ≤ 10mL/min/1.73m2. Analytical approach: Multivariable Cox regression model with internal validation by bootstrapping. Models including clinical and demographic variables were fit with the addition of histopathologic features. To assess the impact of adding a histopathology variable, the amount of variance explained (r2) and the C index were calculated. The impact on prediction was assessed by calculating the net reclassification index for each histopathologic variable and for all combined. Results: Median follow-up was 3.1 years. Within 5 years of biopsy, 411 (15.1%) patients developed kidney failure. Multivariable analyses including demographic and clinical variables revealed that severe glomerular obsolescence (adjusted HR, 2.03; 95% CI, 1.51-2.03), severe interstitial fibrosis and tubular atrophy (adjusted HR, 1.99; 95% CI, 1.52-2.59), and severe arteriolar hyalinosis (adjusted HR, 1.53; 95% CI, 1.14-2.05) were independently associated with the primary outcome. The addition of all histopathologic variables to the clinical model yielded a net reclassification index for kidney failure of 5.1% (P < 0.001) with a full model C statistic of 0.915. Analyses addressing the competing risk for death, optimism, or shrinkage did not significantly change the results. Limitations: Selection bias from the use of clinically indicated biopsies and exclusion of patients with less than 2 years of follow-up, as well as reliance on surrogate indicators of kidney failure onset. Conclusions: A model incorporating histopathologic features from kidney biopsy specimens improved prediction of kidney failure and may be valuable clinically. Future studies will be needed to understand whether even more detailed characterization of kidney tissue may further improve prognostication about the future trajectory of estimated glomerular filtration rate.Item The synchronicity of COVID-19 disparities: Statewide epidemiologic trends in SARS-CoV-2 morbidity, hospitalization, and mortality among racial minorities and in rural America(PLOS One, 2021-07-20) Dixon, Brian E.; Grannis, Shaun J.; Lembcke, Lauren R.; Valvi, Nimish; Roberts, Anna R.; Embi, Peter J.; Epidemiology, School of Public HealthBackground Early studies on COVID-19 identified unequal patterns in hospitalization and mortality in urban environments for racial and ethnic minorities. These studies were primarily single center observational studies conducted within the first few weeks or months of the pandemic. We sought to examine trends in COVID-19 morbidity, hospitalization, and mortality over time for minority and rural populations, especially during the U.S. fall surge. Methods Data were extracted from a statewide cohort of all adult residents in Indiana tested for SARS-CoV-2 infection between March 1 and December 31, 2020, linked to electronic health records. Primary measures were per capita rates of infection, hospitalization, and death. Age adjusted rates were calculated for multiple time periods corresponding to public health mitigation efforts. Comparisons across time within groups were compared using ANOVA. Results Morbidity and mortality increased over time with notable differences among sub-populations. Initially, hospitalization rates among racial minorities were 3–4 times higher than whites, and mortality rates among urban residents were twice those of rural residents. By fall 2020, hospitalization and mortality rates in rural areas surpassed those of urban areas, and gaps between black/brown and white populations narrowed. Changes across time among demographic groups was significant for morbidity and hospitalization. Cumulative morbidity and mortality were highest among minority groups and in rural communities. Conclusions The synchronicity of disparities in COVID-19 by race and geography suggests that health officials should explicitly measure disparities and adjust mitigation as well as vaccination strategies to protect those sub-populations with greater disease burden.