- Browse by Author
Browsing by Author "Zhang, Zuoyi"
Now showing 1 - 10 of 13
Results Per Page
Sort Options
Item Analyzing the symptoms in colorectal and breast cancer patients with or without type 2 diabetes using EHR data(Sage, 2021) Luo, Xiao; Storey, Susan; Gandhi, Priyanka; Zhang, Zuoyi; Metzger, Megan; Huang, Kun; Computer Information and Graphics Technology, School of Engineering and TechnologyThis research extracted patient-reported symptoms from free-text EHR notes of colorectal and breast cancer patients and studied the correlation of the symptoms with comorbid type 2 diabetes, race, and smoking status. An NLP framework was developed first to use UMLS MetaMap to extract all symptom terms from the 366,398 EHR clinical notes of 1694 colorectal cancer (CRC) patients and 3458 breast cancer (BC) patients. Semantic analysis and clustering algorithms were then developed to categorize all the relevant symptoms into eight symptom clusters defined by seed terms. After all the relevant symptoms were extracted from the EHR clinical notes, the frequency of the symptoms reported from colorectal cancer (CRC) and breast cancer (BC) patients over three time-periods post-chemotherapy was calculated. Logistic regression (LR) was performed with each symptom cluster as the response variable while controlling for diabetes, race, and smoking status. The results show that the CRC and BC patients with Type 2 Diabetes (T2D) were more likely to report symptoms than CRC and BC without T2D over three time-periods in the cancer trajectory. We also found that current smokers were more likely to report anxiety (CRC, BC), neuropathic symptoms (CRC, BC), anxiety (BC), and depression (BC) than non-smokers.Item Application of unsupervised deep learning algorithms for identification of specific clusters of chronic cough patients from EMR data(BMC, 2022-04-19) Shao, Wei; Luo, Xiao; Zhang, Zuoyi; Han, Zhi; Chandrasekaran, Vasu; Turzhitsky, Vladimir; Bali, Vishal; Roberts, Anna R.; Metzger, Megan; Baker, Jarod; La Rosa, Carmen; Weaver, Jessica; Dexter, Paul; Huang, Kun; Biostatistics and Health Data Science, School of MedicineBackground: Chronic cough affects approximately 10% of adults. The lack of ICD codes for chronic cough makes it challenging to apply supervised learning methods to predict the characteristics of chronic cough patients, thereby requiring the identification of chronic cough patients by other mechanisms. We developed a deep clustering algorithm with auto-encoder embedding (DCAE) to identify clusters of chronic cough patients based on data from a large cohort of 264,146 patients from the Electronic Medical Records (EMR) system. We constructed features using the diagnosis within the EMR, then built a clustering-oriented loss function directly on embedded features of the deep autoencoder to jointly perform feature refinement and cluster assignment. Lastly, we performed statistical analysis on the identified clusters to characterize the chronic cough patients compared to the non-chronic cough patients. Results: The experimental results show that the DCAE model generated three chronic cough clusters and one non-chronic cough patient cluster. We found various diagnoses, medications, and lab tests highly associated with chronic cough patients by comparing the chronic cough cluster with the non-chronic cough cluster. Comparison of chronic cough clusters demonstrated that certain combinations of medications and diagnoses characterize some chronic cough clusters. Conclusions: To the best of our knowledge, this study is the first to test the potential of unsupervised deep learning methods for chronic cough investigation, which also shows a great advantage over existing algorithms for patient data clustering.Item Completeness and timeliness of notifiable disease reporting: a comparison of laboratory and provider reports submitted to a large county health department(Springer Nature, 2017-06-23) Dixon, Brian E.; Zhang, Zuoyi; Lai, Patrick T. S.; Kirbiyik, Uzay; Williams, Jennifer; Hills, Rebecca; Revere, Debra; Gibson, P. Joseph; Grannis, Shaun J.; BioHealth Informatics, School of Informatics and ComputingBACKGROUND: Most public health agencies expect reporting of diseases to be initiated by hospital, laboratory or clinic staff even though so-called passive approaches are known to be burdensome for reporters and produce incomplete as well as delayed reports, which can hinder assessment of disease and delay recognition of outbreaks. In this study, we analyze patterns of reporting as well as data completeness and timeliness for traditional, passive reporting of notifiable disease by two distinct sources of information: hospital and clinic staff versus clinical laboratory staff. Reports were submitted via fax machine as well as electronic health information exchange interfaces. METHODS: Data were extracted from all submitted notifiable disease reports for seven representative diseases. Reporting rates are the proportion of known cases having a corresponding case report from a provider, a faxed laboratory report or an electronic laboratory report. Reporting rates were stratified by disease and compared using McNemar's test. For key data fields on the reports, completeness was calculated as the proportion of non-blank fields. Timeliness was measured as the difference between date of laboratory confirmed diagnosis and the date the report was received by the health department. Differences in completeness and timeliness by data source were evaluated using a generalized linear model with Pearson's goodness of fit statistic. RESULTS: We assessed 13,269 reports representing 9034 unique cases. Reporting rates varied by disease with overall rates of 19.1% for providers and 84.4% for laboratories (p < 0.001). All but three of 15 data fields in provider reports were more often complete than those fields within laboratory reports (p <0.001). Laboratory reports, whether faxed or electronically sent, were received, on average, 2.2 days after diagnosis versus a week for provider reports (p <0.001). CONCLUSIONS: Despite growth in the use of electronic methods to enhance notifiable disease reporting, there still exists much room for improvement.Item Differences in Health-Related Outcomes and Health Care Resource Utilization in Breast Cancer Survivors With and Without Type 2 Diabetes(AdvocateAuroraHealth, 2022-01-17) Storey, Susan; Zhang, Zuoyi; Luo, Xiao; Metzger, Megan; Jakka, Amrutha Ravali; Huang, Kun; Von Ah, Diane; School of NursingPurpose: Up to 74% of breast cancer survivors (BCS) have at least one preexisting comorbid condition, with diabetes (type 2) common. The purpose of this study was to examine differences in health-related outcomes (anemia, neutropenia, and infection) and utilization of health care resources (inpatient, outpatient, and emergency visits) in BCS with and without diabetes. Methods: In this retrospective cohort study, data were leveraged from the electronic health records of a large health network linked to the Indiana State Cancer Registry. BCS diagnosed between January 2007 and December 2017 and who had received chemotherapy were included. Multivariable logistic regression and generalized linear models were used to determine differences in health outcomes and health care resources. Results: The cohort included 6851 BCS, of whom 1121 (16%) had a diagnosis of diabetes. BCS were, on average, 55 (standard deviation: 11.88) years old, the majority self-reported race as White (90%), and 48.8% had stage II breast cancer. BCS with diabetes were significantly older (mean age of 60.6 [SD: 10.34] years) than those without diabetes and were often obese (66% had body mass index of ≥33). BCS with diabetes had higher odds of anemia (odds ratio: 1.43; 95% CI: 1.04, 1.96) and infection (odds ratio: 1.86; 95% CI: 1.35, 2.55) and utilized more outpatient resources (P<0.0001). Conclusions: Diabetes has a deleterious effect on health-related outcomes and health care resource utilization among BCS. These findings support the need for clinical practice guidelines to help clinicians manage diabetes among BCS throughout the cancer trajectory and for coordinated models of care to reduce high resource utilization.Item Extending Achilles Heel Data Quality Tool with New Rules Informed by Multi-Site Data Quality Comparison(IOS, 2019) Huser, Vojtech; Li, Xiaochun; Zhang, Zuoyi; Jung, Sungjae; Woong Park, Rae; Banda, Juan; Razzaghi, Hanieh; Londhe, Ajit; Natarajan, Karthik; Biostatistics, School of Public HealthLarge healthcare datasets of Electronic Health Record data became indispensable in clinical research. Data quality in such datasets recently became a focus of many distributed research networks. Despite the fact that data quality is specific to a given research question, many existing data quality platform prove that general data quality assessment on dataset level (given a spectrum of research questions) is possible and highly requested by researchers. We present comparison of 12 datasets and extension of Achilles Heel data quality software tool with new rules and data characterization measures.Item Gonorrhea testing, morbidity, and reporting using an integrated sexually transmitted disease registry in Indiana: 2004-2016(Sage, 2021-01) Ojo, Opeyemi C.; Arno, Janet N.; Tao, Guoyu; Patel, Chirag G.; Zhang, Zuoyi; Wang, Jane; Holderman, Justin; Dixon, Brian E.; Medicine, School of MedicineBackground: Surveillance of gonorrhea (GC), the second most common notifiable disease in the United States, depends on case reports. Population-level data that contain the number of individuals tested in addition to morbidity are lacking. Methods: We performed a cross-sectional analysis of data obtained from individuals tested for GC recorded in a STD registry. Descriptive statistics were performed, and a Poisson generalized linear model was used to evaluate the number of individuals tested for GC and the positivity rate. GC cases from a subset of the registry was compared to CDC to determine the completeness of the registry. Results: A total of 1,870,811 GC tests were linked to 627,870 unique individuals. Individuals tested for GC increased from 54,334 in 2004 to 269,701 in 2016; likewise, GC cases increased from 2,039 to 5,997. However, positivity rate decreased from 3.75% in 2004 to 2.22% in 2016. The difference in the number of GC cases captured by the registry and those reported to the CDC were not statistically significant (P= 0.0665). Conclusions: Population-level data from a STD registry combining electronic medical records and public health case data may inform STD control efforts. In Indiana, increased testing rates appeared to correlate with increased GC morbidity.Item Improving Notifiable Disease Case Reporting Through Electronic Information Exchange–Facilitated Decision Support: A Controlled Before-and-After Trial(Sage, 2020) Dixon, Brian E.; Zhang, Zuoyi; Arno, Janet N.; Revere, Debra; Gibson, P. Joseph; Grannis, Shaun J.; Epidemiology, School of Public HealthObjective: Outbreak detection and disease control may be improved by simplified, semi-automated reporting of notifiable diseases to public health authorities. The objective of this study was to determine the effect of an electronic, prepopulated notifiable disease report form on case reporting rates by ambulatory care clinics to public health authorities. Methods: We conducted a 2-year (2012-2014) controlled before-and-after trial of a health information exchange (HIE) intervention in Indiana designed to prepopulate notifiable disease reporting forms to providers. We analyzed data collected from electronic prepopulated reports and "usual care" (paper, fax) reports submitted to a local health department for 7 conditions by using a difference-in-differences model. Primary outcomes were changes in reporting rates, completeness, and timeliness between intervention and control clinics. Results: Provider reporting rates for chlamydia and gonorrhea in intervention clinics increased significantly from 56.9% and 55.6%, respectively, during the baseline period (2012) to 66.4% and 58.3%, respectively, during the intervention period (2013-2014); they decreased from 28.8% and 27.5%, respectively, to 21.7% and 20.6%, respectively, in control clinics (P < .001). Completeness improved from baseline to intervention for 4 of 15 fields in reports from intervention clinics (P < .001), although mean completeness improved for 11 fields in both intervention and control clinics. Timeliness improved for both intervention and control clinics; however, reports from control clinics were timelier (mean, 7.9 days) than reports from intervention clinics (mean, 9.7 days). Conclusions: Electronic, prepopulated case reporting forms integrated into providers' workflow, enabled by an HIE network, can be effective in increasing notifiable disease reporting rates and completeness of information. However, it was difficult to assess the effect of using the forms for diseases with low prevalence (eg, salmonellosis, histoplasmosis).Item Initial uptake, time to treatment, and real-world effectiveness of all-oral direct-acting antivirals for hepatitis C virus infection in the United States: A retrospective cohort analysis(PLOS, 2019-08-22) Kwo, Paul Y.; Puenpatom, Amy; Zhang, Zuoyi; Hui, Siu L.; Kelley, Andrea A.; Muschi, David; Biostatistics, School of Public HealthBACKGROUND: Data on initiation and utilization of direct-acting antiviral therapies for hepatitis C virus infection in the United States are limited. This study evaluated treatment initiation, time to treatment, and real-world effectiveness of direct-acting antiviral therapy in individuals with hepatitis C virus infection treated during the first 2 years of availability of all-oral direct-acting antiviral therapies. METHODS: A retrospective cohort analysis was undertaken using electronic medical records and chart review abstraction of hepatitis C virus-infected individuals aged >18 years diagnosed with chronic hepatitis C virus infection between January 1, 2014, and December 31, 2015 from the Indiana University Health database. RESULTS: Eight hundred thirty people initiated direct-acting antiviral therapy during the 2-year observation window. The estimated incidence of treatment initiation was 8.8%±0.34% at the end of year 1 and 15.0%±0.5% at the end of year 2. Median time to initiating therapy was 300 days. Using a Cox regression analysis, positive predictors of treatment initiation included age (hazard ratio, 1.008), prior hepatitis C virus treatment (1.74), cirrhosis (2.64), and history of liver transplant (1.5). History of drug abuse (0.43), high baseline alanine aminotransferase levels (0.79), hepatitis B virus infection (0.41), and self-pay (0.39) were negatively associated with treatment initiation. In the evaluable population (n = 423), 83.9% (95% confidence interval, 80.1-87.3%) of people achieved sustained virologic response. CONCLUSION: In the early years of the direct-acting antiviral era, <10% of people diagnosed with chronic hepatitis C virus infection received direct-acting antiviral treatment; median time to treatment initiation was 300 days. Future analyses should evaluate time to treatment initiation among those with less advanced fibrosis.Item An Integrated Surveillance System to Examine Testing, Services, and Outcomes for Sexually Transmitted Diseases(IOS, 2017) Dixon, Brian E.; Tao, Guoyu; Wang, Jane; Tu, Wanzhu; Hoover, Sarah; Zhang, Zuoyi; Batteiger, Teresa A.; Arno, Janet N.; Epidemiology, School of Public HealthDespite laws that require reporting of sexually transmitted diseases (STDs) to governmental health agencies, integrated surveillance of STDs remains challenging. Data and information about testing are fragmented from information on treatment and outcomes. To overcome this fragmentation, data from multiple electronic systems spanning clinical and public health environments were integrated to create an STD surveillance registry. Electronic health records, disease case records, and birth registry records were linked and then stored in a de-identified, secure server for use by health officials and researchers. The registry contains nearly 6 million tests for 628,138 individuals over a 12-year period. The registry supports efforts to understand the epidemiology of STDs as well as health services and outcomes for those diagnosed with STDs. Specialized disease registries hold promise for collaboration across clinical and public health domains to improve surveillance efforts, reduce health disparities, and increase prevention efforts at the local level.Item Predictive Modeling of Hypoglycemia for Clinical Decision Support in Evaluating Outpatients with Diabetes Mellitus(Taylor & Francis, 2019) Li, Xiaochun; Yu, Shengsheng; Zhang, Zuoyi; Radican, Larry; Cummins, Jonathan; Engel, Samuel S.; Iglay, Kristy; Duke, Jon; Baker, Jarod; Brodovicz, Kimberly G.; Naik, Ramachandra G.; Leventhal, Jeremy; Chatterjee, Arnaub K.; Rajpathak, Swapnil; Weiner, Michael; Biostatistics, School of Public HealthObjective: Hypoglycemia occurs in 20–60% of patients with diabetes mellitus. Identifying at-risk patients can facilitate interventions to lower risk. We sought to develop a hypoglycemia prediction model. Methods: In this retrospective cohort study, urban adults prescribed a diabetes drug between 2004 and 2013 were identified. Demographic and clinical data were extracted from an electronic medical record (EMR). Laboratory tests, diagnostic codes and natural language processing (NLP) identified hypoglycemia. We compared multiple logistic regression, classification and regression trees (CART), and random forest. Models were evaluated on an independent test set or through cross-validation. Results: The 38,780 patients had mean age 57 years; 56% were female, 40% African-American and 39% uninsured. Hypoglycemia occurred in 8128 (539 identified only by NLP). In logistic regression, factors positively associated with hypoglycemia included infection, non-long-acting insulin, dementia and recent hypoglycemia. Negatively associated factors included long-acting insulin plus sulfonylurea, and age 75 or older. The models’ area under curve was similar (logistic regression, 89%; CART, 88%; random forest, 90%, with ten-fold cross-validation). Conclusions: NLP improved identification of hypoglycemia. Non-long-acting insulin was an important risk factor. Decreased risk with age may reflect treatment or diminished awareness of hypoglycemia. More complex models did not improve prediction.