IU Indianapolis ScholarWorks :: Browsing by Subject "data quality"

Browsing by Subject "data quality"

Now showing 1 - 4 of 4

Comparative analysis of nuclei isolation methods for brain single-nucleus RNA sequencing
(bioRxiv, 2025-03-26) Kersey, Holly N.; Acri, Dominic J.; Dabin, Luke C.; Hartigan, Kelly; Mustaklem, Richard; Park, Jung Hyun; Kim, Jungsu; Medical and Molecular Genetics, School of Medicine
Single-nucleus RNA sequencing (snRNA-seq) enables resolving cellular heterogeneity in complex tissues. snRNA-seq overcomes limitations of traditional single-cell RNA-seq by using nuclei instead of cells, allowing to utilize frozen tissues and difficult-to-isolate cell types. Although various nuclei isolation methods have been developed, systematic evaluations of their effects on nuclear integrity and subsequent data quality remain lacking, a critical gap with profound implications for the rigor and reproducibility. To address this, we compared three mechanistically distinct nuclei isolation strategies with brain tissues: a sucrose gradient centrifugation-based method, a spin column-based method, and a machine-assisted platform. All methods successfully captured diverse cell types but revealed considerable protocol-dependent differences in cell type proportions, transcriptional homogeneity, and the preservation of cell-type-specific and cell-state-specific markers. Moreover, isolation workflows differentially influenced contamination levels from ambient, mitochondrial, and ribosomal RNAs. Our findings establish nuclei isolation methodology as a critical experimental variable shaping snRNA-seq data quality and biological interpretation. Motivation: Single-nucleus RNA sequencing (snRNA-seq) has become an essential tool for transcriptomic analysis of complex tissues. However, the quality and efficiency of data generation depend heavily on the method used for nuclear isolation. The existing isolation techniques vary in their ability to preserve nuclear integrity, minimize ambient RNA contamination, and optimize recovery rates. Despite these differences in quality, a systematic comparison of these methods, specifically for brain tissue, is lacking. This gap poses a challenge for researchers in choosing the most suitable approach for their particular experimental requirements. To address this critical issue, our study directly compared three nuclei isolation methods and evaluated their performance in terms of yield, purity, and downstream sequencing quality. By providing a comprehensive assessment, we aim to guide researchers in selecting the most appropriate isolation protocol for their snRNA-seq experiments, ensuring optimal results and advancing the study of complex brain tissues at the single-nucleus level.
Developing Automated Computer Algorithms to Track Periodontal Disease Change from Longitudinal Electronic Dental Records
(MDPI, 2023-03-08) Patel, Jay S.; Kumar, Krishna; Zai, Ahad; Shin, Daniel; Willis, Lisa; Thyvalikakath, Thankam P.
Objective: To develop two automated computer algorithms to extract information from clinical notes, and to generate three cohorts of patients (disease improvement, disease progression, and no disease change) to track periodontal disease (PD) change over time using longitudinal electronic dental records (EDR). Methods: We conducted a retrospective study of 28,908 patients who received a comprehensive oral evaluation between 1 January 2009, and 31 December 2014, at Indiana University School of Dentistry (IUSD) clinics. We utilized various Python libraries, such as Pandas, TensorFlow, and PyTorch, and a natural language tool kit to develop and test computer algorithms. We tested the performance through a manual review process by generating a confusion matrix. We calculated precision, recall, sensitivity, specificity, and accuracy to evaluate the performances of the algorithms. Finally, we evaluated the density of longitudinal EDR data for the following follow-up times: (1) None; (2) Up to 5 years; (3) > 5 and ≤ 10 years; and (4) >10 and ≤ 15 years. Results: Thirty-four percent (n = 9954) of the study cohort had up to five years of follow-up visits, with an average of 2.78 visits with periodontal charting information. For clinician-documented diagnoses from clinical notes, 42% of patients (n = 5562) had at least two PD diagnoses to determine their disease change. In this cohort, with clinician-documented diagnoses, 72% percent of patients (n = 3919) did not have a disease status change between their first and last visits, 669 (13%) patients’ disease status progressed, and 589 (11%) patients’ disease improved. Conclusions: This study demonstrated the feasibility of utilizing longitudinal EDR data to track disease changes over 15 years during the observation study period. We provided detailed steps and computer algorithms to clean and preprocess the EDR data and generated three cohorts of patients. This information can now be utilized for studying clinical courses using artificial intelligence and machine learning methods.
Extending Achilles Heel Data Quality Tool with New Rules Informed by Multi-Site Data Quality Comparison
(IOS, 2019) Huser, Vojtech; Li, Xiaochun; Zhang, Zuoyi; Jung, Sungjae; Woong Park, Rae; Banda, Juan; Razzaghi, Hanieh; Londhe, Ajit; Natarajan, Karthik; Biostatistics, School of Public Health
Large healthcare datasets of Electronic Health Record data became indispensable in clinical research. Data quality in such datasets recently became a focus of many distributed research networks. Despite the fact that data quality is specific to a given research question, many existing data quality platform prove that general data quality assessment on dataset level (given a spectrum of research questions) is possible and highly requested by researchers. We present comparison of 12 datasets and extension of Achilles Heel data quality software tool with new rules and data characterization measures.
Quality control questions on Amazon's Mechanical Turk (MTurk): A randomized trial of impact on the USAUDIT, PHQ-9, and GAD-7
(Springer, 2021-08-06) Agley, Jon; Xiao, Yunyu; Nolan, Rachael; Golzarri-Arroyo, Lilian; School of Social Work
Crowdsourced psychological and other biobehavioral research using platforms like Amazon's Mechanical Turk (MTurk) is increasingly common - but has proliferated more rapidly than studies to establish data quality best practices. Thus, this study investigated whether outcome scores for three common screening tools would be significantly different among MTurk workers who were subject to different sets of quality control checks. We conducted a single-stage, randomized controlled trial with equal allocation to each of four study arms: Arm 1 (Control Arm), Arm 2 (Bot/VPN Check), Arm 3 (Truthfulness/Attention Check), and Arm 4 (Stringent Arm - All Checks). Data collection was completed in Qualtrics, to which participants were referred from MTurk. Subjects (n = 1100) were recruited on November 20-21, 2020. Eligible workers were required to claim U.S. residency, have a successful task completion rate > 95%, have completed a minimum of 100 tasks, and have completed a maximum of 10,000 tasks. Participants completed the US-Alcohol Use Disorders Identification Test (USAUDIT), the Patient Health Questionnaire (PHQ-9), and a screener for Generalized Anxiety Disorder (GAD-7). We found that differing quality control approaches significantly, meaningfully, and directionally affected outcome scores on each of the screening tools. Most notably, workers in Arm 1 (Control) reported higher scores than those in Arms 3 and 4 for all tools, and a higher score than workers in Arm 2 for the PHQ-9. These data suggest that the use, or lack thereof, of quality control questions in crowdsourced research may substantively affect findings, as might the types of quality control items.

Browsing by Subject "data quality"

Results Per Page

Sort Options