- Browse by Subject
Browsing by Subject "data quality"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Developing Automated Computer Algorithms to Track Periodontal Disease Change from Longitudinal Electronic Dental Records(MDPI, 2023-03-08) Patel, Jay S.; Kumar, Krishna; Zai, Ahad; Shin, Daniel; Willis, Lisa; Thyvalikakath, Thankam P.Objective: To develop two automated computer algorithms to extract information from clinical notes, and to generate three cohorts of patients (disease improvement, disease progression, and no disease change) to track periodontal disease (PD) change over time using longitudinal electronic dental records (EDR). Methods: We conducted a retrospective study of 28,908 patients who received a comprehensive oral evaluation between 1 January 2009, and 31 December 2014, at Indiana University School of Dentistry (IUSD) clinics. We utilized various Python libraries, such as Pandas, TensorFlow, and PyTorch, and a natural language tool kit to develop and test computer algorithms. We tested the performance through a manual review process by generating a confusion matrix. We calculated precision, recall, sensitivity, specificity, and accuracy to evaluate the performances of the algorithms. Finally, we evaluated the density of longitudinal EDR data for the following follow-up times: (1) None; (2) Up to 5 years; (3) > 5 and ≤ 10 years; and (4) >10 and ≤ 15 years. Results: Thirty-four percent (n = 9954) of the study cohort had up to five years of follow-up visits, with an average of 2.78 visits with periodontal charting information. For clinician-documented diagnoses from clinical notes, 42% of patients (n = 5562) had at least two PD diagnoses to determine their disease change. In this cohort, with clinician-documented diagnoses, 72% percent of patients (n = 3919) did not have a disease status change between their first and last visits, 669 (13%) patients’ disease status progressed, and 589 (11%) patients’ disease improved. Conclusions: This study demonstrated the feasibility of utilizing longitudinal EDR data to track disease changes over 15 years during the observation study period. We provided detailed steps and computer algorithms to clean and preprocess the EDR data and generated three cohorts of patients. This information can now be utilized for studying clinical courses using artificial intelligence and machine learning methods.Item Extending Achilles Heel Data Quality Tool with New Rules Informed by Multi-Site Data Quality Comparison(IOS, 2019) Huser, Vojtech; Li, Xiaochun; Zhang, Zuoyi; Jung, Sungjae; Woong Park, Rae; Banda, Juan; Razzaghi, Hanieh; Londhe, Ajit; Natarajan, Karthik; Biostatistics, School of Public HealthLarge healthcare datasets of Electronic Health Record data became indispensable in clinical research. Data quality in such datasets recently became a focus of many distributed research networks. Despite the fact that data quality is specific to a given research question, many existing data quality platform prove that general data quality assessment on dataset level (given a spectrum of research questions) is possible and highly requested by researchers. We present comparison of 12 datasets and extension of Achilles Heel data quality software tool with new rules and data characterization measures.Item Quality control questions on Amazon's Mechanical Turk (MTurk): A randomized trial of impact on the USAUDIT, PHQ-9, and GAD-7(Springer, 2021-08-06) Agley, Jon; Xiao, Yunyu; Nolan, Rachael; Golzarri-Arroyo, Lilian; School of Social WorkCrowdsourced psychological and other biobehavioral research using platforms like Amazon's Mechanical Turk (MTurk) is increasingly common - but has proliferated more rapidly than studies to establish data quality best practices. Thus, this study investigated whether outcome scores for three common screening tools would be significantly different among MTurk workers who were subject to different sets of quality control checks. We conducted a single-stage, randomized controlled trial with equal allocation to each of four study arms: Arm 1 (Control Arm), Arm 2 (Bot/VPN Check), Arm 3 (Truthfulness/Attention Check), and Arm 4 (Stringent Arm - All Checks). Data collection was completed in Qualtrics, to which participants were referred from MTurk. Subjects (n = 1100) were recruited on November 20-21, 2020. Eligible workers were required to claim U.S. residency, have a successful task completion rate > 95%, have completed a minimum of 100 tasks, and have completed a maximum of 10,000 tasks. Participants completed the US-Alcohol Use Disorders Identification Test (USAUDIT), the Patient Health Questionnaire (PHQ-9), and a screener for Generalized Anxiety Disorder (GAD-7). We found that differing quality control approaches significantly, meaningfully, and directionally affected outcome scores on each of the screening tools. Most notably, workers in Arm 1 (Control) reported higher scores than those in Arms 3 and 4 for all tools, and a higher score than workers in Arm 2 for the PHQ-9. These data suggest that the use, or lack thereof, of quality control questions in crowdsourced research may substantively affect findings, as might the types of quality control items.