- Browse by Author
Browsing by Author "Biostatistics and Health Data Science, School of Medicine"
Now showing 1 - 10 of 99
Results Per Page
Sort Options
Item A Bayesian phase I/II biomarker-based design for identifying subgroup-specific optimal dose for immunotherapy(Sage, 2022) Guo, Beibei; Zang, Yong; Biostatistics and Health Data Science, School of MedicineImmunotherapy is an innovative treatment that enlists the patient’s immune system to battle tumors. The optimal dose for treating patients with an immunotherapeutic agent may differ according to their biomarker status. In this article, we propose a biomarker-based phase I/II dose-finding design for identifying subgroup-specific optimal dose for immunotherapy (BSOI) that jointly models the immune response, toxicity, and efficacy outcomes. We propose parsimonious yet flexible models to borrow information across different types of outcomes and subgroups. We quantify the desirability of the dose using a utility function and adopt a two-stage dose-finding algorithm to find the optimal dose for each subgroup. Simulation studies show that the BSOI design has desirable operating characteristics in selecting the subgroup-specific optimal doses and allocating patients to those optimal doses, and outperforms conventional designs.Item A Comparative Analysis of Oral Health and Self-Rated Health: ‘All of Us Research Program’ vs. ‘Health and Retirement Study’(MDPI, 2024-09-13) Weintraub, Jane A.; Moss, Kevin L.; Finlayson, Tracy L.; Jones, Judith A.; Preisser, John S.; Biostatistics and Health Data Science, School of MedicinePoor oral health can impact overall health. This study assessed the association between dental factors (dentate status and dental utilization) and self-rated health (S-RH) among older adults in two cross-sectional datasets: (1) NIH "All of Us (AoU) Research Program" (May 2018-July 2022 release) and (2) U.S. nationally representative "Health and Retirement Study" (HRS) 2018 wave. Participants aged ≥ 51 years were included in these analyses if (1) from AoU, they had clinical dental and medical data from electronic health records (EHRs) and surveys (n = 5480), and (2) from HRS, they had dental and socio-demographic survey data (n = 14,358). S-RH was dichotomized (fair/poor vs. better) and analyzed with logistic regression. Sample survey weights for HRS and stratification and averaging AoU results used the weighted HRS race-ethnicity and age distribution standardized respective analyses to the U.S. population. Fair/poor S-RH was reported by 32.6% in AoU and 28.6% in HRS. Dentate status information was available from 7.7% of AoU EHRs. In population-standardized analyses, lack of dental service use increased odds of fair/poor S-RH in AoU, OR (95% CI) = 1.28 (1.11-1.48), and in HRS = 1.45 (1.09-1.94), as did having diabetes, less education, and ever being a smoker. Having no natural teeth was not statistically associated with fair/poor S-RH. Lack of dental service was positively associated with fair/poor S-RH in both datasets. More and better oral health information in AoU and HRS are needed.Item A Deep Language Model for Symptom Extraction From Clinical Text and its Application to Extract COVID-19 Symptoms From Social Media(IEEE, 2022) Luo, Xiao; Gandhi, Priyanka; Storey, Susan; Huang, Kun; Biostatistics and Health Data Science, School of MedicinePatients experience various symptoms when they have either acute or chronic diseases or undergo some treatments for diseases. Symptoms are often indicators of the severity of the disease and the need for hospitalization. Symptoms are often described in free text written as clinical notes in the Electronic Health Records (EHR) and are not integrated with other clinical factors for disease prediction and healthcare outcome management. In this research, we propose a novel deep language model to extract patient-reported symptoms from clinical text. The deep language model integrates syntactic and semantic analysis for symptom extraction and identifies the actual symptoms reported by patients and conditional or negation symptoms. The deep language model can extract both complex and straightforward symptom expressions. We used a real-world clinical notes dataset to evaluate our model and demonstrated that our model achieves superior performance compared to three other state-of-the-art symptom extraction models. We extensively analyzed our model to illustrate its effectiveness by examining each component’s contribution to the model. Finally, we applied our model on a COVID-19 tweets data set to extract COVID-19 symptoms. The results show that our model can identify all the symptoms suggested by CDC ahead of their timeline and many rare symptoms.Item A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies(Springer Nature, 2022) Li, Zilin; Li, Xihao; Zhou, Hufeng; Gaynor, Sheila M.; Selvaraj, Margaret Sunitha; Arapoglou, Theodore; Quick, Corbin; Liu, Yaowu; Chen, Han; Sun, Ryan; Dey, Rounak; Arnett, Donna K.; Auer, Paul L.; Bielak, Lawrence F.; Bis, Joshua C.; Blackwell, Thomas W.; Blangero, John; Boerwinkle, Eric; Bowden, Donald W.; Brody, Jennifer A.; Cade, Brian E.; Conomos, Matthew P.; Correa, Adolfo; Cupples, L. Adrienne; Curran, Joanne E.; de Vries, Paul S.; Duggirala, Ravindranath; Franceschini, Nora; Freedman, Barry I.; Göring, Harald H. H.; Guo, Xiuqing; Kalyani, Rita R.; Kooperberg, Charles; Kral, Brian G.; Lange, Leslie A.; Lin, Bridget M.; Manichaikul, Ani; Manning, Alisa K.; Martin, Lisa W.; Mathias, Rasika A.; Meigs, James B.; Mitchell, Braxton D.; Montasser, May E.; Morrison, Alanna C.; Naseri, Take; O'Connell, Jeffrey R.; Palmer, Nicholette D.; Peyser, Patricia A.; Psaty, Bruce M.; Raffield, Laura M.; Redline, Susan; Reiner, Alexander P.; Reupena, Muagututi'a Sefuiva; Rice, Kenneth M.; Rich, Stephen S.; Smith, Jennifer A.; Taylor, Kent D.; Taub, Margaret A.; Vasan, Ramachandran S.; Weeks, Daniel E.; Wilson, James G.; Yanek, Lisa R.; Zhao, Wei; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium; TOPMed Lipids Working Group; Rotter, Jerome I.; Willer, Cristen J.; Natarajan, Pradeep; Peloso, Gina M.; Lin, Xihong; Biostatistics and Health Data Science, School of MedicineLarge-scale whole-genome sequencing studies have enabled analysis of noncoding rare-variant (RV) associations with complex human diseases and traits. Variant-set analysis is a powerful approach to study RV association. However, existing methods have limited ability in analyzing the noncoding genome. We propose a computationally efficient and robust noncoding RV association detection framework, STAARpipeline, to automatically annotate a whole-genome sequencing study and perform flexible noncoding RV association analysis, including gene-centric analysis and fixed window-based and dynamic window-based non-gene-centric analysis by incorporating variant functional annotations. In gene-centric analysis, STAARpipeline uses STAAR to group noncoding variants based on functional categories of genes and incorporate multiple functional annotations. In non-gene-centric analysis, STAARpipeline uses SCANG-STAAR to incorporate dynamic window sizes and multiple functional annotations. We apply STAARpipeline to identify noncoding RV sets associated with four lipid traits in 21,015 discovery samples from the Trans-Omics for Precision Medicine (TOPMed) program and replicate several of them in an additional 9,123 TOPMed samples. We also analyze five non-lipid TOPMed traits.Item A reference-free R-learner for treatment recommendation(Sage, 2023) Zhou, Junyi; Zhang, Ying; Tu, Wanzhu; Biostatistics and Health Data Science, School of MedicineAssigning optimal treatments to individual patients based on their characteristics is the ultimate goal of precision medicine. Deriving evidence-based recommendations from observational data while considering the causal treatment effects and patient heterogeneity is a challenging task, especially in situations of multiple treatment options. Herein, we propose a reference-free R-learner based on a simplex algorithm for treatment recommendation. We showed through extensive simulation that the proposed method produced accurate recommendations that corresponded to optimal treatment outcomes, regardless of the reference group. We used the method to analyze data from the Systolic Blood Pressure Intervention Trial (SPRINT) and achieved recommendations consistent with the current clinical guidelines.Item A sequential Monte Carlo Gibbs coupled with stochastically approximated expectation-maximization algorithm for functional data(International Press, 2022-01-11) Liu, Ziyue; Biostatistics and Health Data Science, School of MedicineWe develop an algorithm to overcome the curse of dimensionality in sequential Monte Carlo (SMC) for functional data. In the inner iterations of the algorithm for given parameter values, the conditional SMC is extended to obtain draws of the underlying state vectors. These draws in turn are used in the outer iterations to update the parameter values in the framework of stochastically approximated expectation-maximization to obtain maximum likelihood estimates of the parameters. Standard errors of the parameters are calculated using a stochastic approximation of Louis formula. Three numeric examples are used for illustration. They show that although the computational burden remains high, the algorithm produces reasonable results without exponentially increasing the particle numbers.Item Accurate identification of circRNA landscape and complexity reveals their pivotal roles in human oligodendroglia differentiation(BMC, 2022-02-07) Li, Yangping; Wang, Feng; Teng, Peng; Ku, Li; Chen, Li; Feng, Yue; Yao, Bing; Biostatistics and Health Data Science, School of MedicineBackground: Circular RNAs (circRNAs), a novel class of poorly conserved non-coding RNAs that regulate gene expression, are highly enriched in the human brain. Despite increasing discoveries of circRNA function in human neurons, the circRNA landscape and function in developing human oligodendroglia, the myelinating cells that govern neuronal conductance, remains unexplored. Meanwhile, improved experimental and computational tools for the accurate identification of circRNAs are needed. Results: We adopt a published experimental approach for circRNA enrichment and develop CARP (CircRNA identification using A-tailing RNase R approach and Pseudo-reference alignment), a comprehensive 21-module computational framework for accurate circRNA identification and quantification. Using CARP, we identify developmentally programmed human oligodendroglia circRNA landscapes in the HOG oligodendroglioma cell line, distinct from neuronal circRNA landscapes. Numerous circRNAs display oligodendroglia-specific regulation upon differentiation, among which a subclass is regulated independently from their parental mRNAs. We find that circRNA flanking introns often contain cis-regulatory elements for RNA editing and are predicted to bind differentiation-regulated splicing factors. In addition, we discover novel oligodendroglia-specific circRNAs that are predicted to sponge microRNAs, which co-operatively promote oligodendroglia development. Furthermore, we identify circRNA clusters derived from differentiation-regulated alternative circularization events within the same gene, each containing a common circular exon, achieving additive sponging effects that promote human oligodendroglia differentiation. Conclusions: Our results reveal dynamic regulation of human oligodendroglia circRNA landscapes during early differentiation and suggest critical roles of the circRNA-miRNA-mRNA axis in advancing human oligodendroglia development.Item ADAM8 is expressed widely in breast cancer and predicts poor outcome in hormone receptor positive, HER-2 negative patients(BMC, 2023-08-11) Pianetti, Stefania; Miller, Kathy D.; Chen, Hannah H.; Althouse, Sandra; Cao, Sha; Michael, Steven J.; Sonenshein, Gail E.; Mineva, Nora D.; Biostatistics and Health Data Science, School of MedicineBackground: Breast malignancies are the predominant cancer-related cause of death in women. New methods of diagnosis, prognosis and treatment are necessary. Previously, we identified the breast cancer cell surface protein ADAM8 as a marker of poor survival, and a driver of Triple-Negative Breast Cancer (TNBC) growth and spread. Immunohistochemistry (IHC) with a research-only anti-ADAM8 antibody revealed 34.0% of TNBCs (17/50) expressed ADAM8. To identify those patients who could benefit from future ADAM8-based interventions, new clinical tests are needed. Here, we report on the preclinical development of a highly specific IHC assay for detection of ADAM8-positive breast tumors. Methods: Formalin-fixed paraffin-embedded sections of ADAM8-positive breast cell lines and patient-derived xenograft tumors were used in IHC to identify a lead antibody, appropriate staining conditions and controls. Patient breast cancer samples (n = 490) were used to validate the assay. Cox proportional hazards models assessed association between survival and ADAM8 expression. Results: ADAM8 staining conditions were optimized, a lead anti-human ADAM8 monoclonal IHC antibody (ADP2) identified, and a breast staining/scoring control cell line microarray (CCM) generated expressing a range of ADAM8 levels. Assay specificity, reproducibility, and appropriateness of the CCM for scoring tumor samples were demonstrated. Consistent with earlier findings, 36.1% (22/61) of patient TNBCs expressed ADAM8. Overall, 33.9% (166/490) of the breast cancer population was ADAM8-positive, including Hormone Receptor (HR) and Human Epidermal Growth Factor Receptor-2 (HER2) positive cancers, which were tested for the first time. For the most prevalent HR-positive/HER2-negative subtype, high ADAM8 expression identified patients at risk of poor survival. Conclusions: Our studies show ADAM8 is widely expressed in breast cancer and provide support for both a diagnostic and prognostic value of the ADP2 IHC assay. As ADAM8 has been implicated in multiple solid malignancies, continued development of this assay may have broad impact on cancer management.Item Age-specific mortality rate ratios in adolescents and youth aged 10–24 years living with perinatally versus nonperinatally acquired HIV(Wolters Kluwer, 2021) Desmonde, Sophie; Ciaranello, Andrea L.; Malateste, Karen; Musick, Beverly; Patten, Gabriela; Thien Vu, An; Edmonds, Andrew; Neilan, Anne M.; Duda, Stephany N.; Wools-Kaloustian, Kara; Davies, Mary-Ann; Leroy, Valériane; Biostatistics and Health Data Science, School of MedicineObjective: To measure mortality incidence rates and incidence rate ratios (IRR) in adolescents and youth living with perinatally acquired HIV (YPHIV) compared with those living with nonperinatally acquired HIV (YNPHIV), by region, by sex, and during the ages of 10-14, 15-19, and 20-24 years in IeDEA. Design and methods: All those with a confirmed HIV diagnosis, antiretroviral therapy (ART)-naive at enrollment, and who have post-ART follow-up while aged 10-24 years between 2004 and 2016 were included. We estimated post-ART mortality incidence rates and 95% confidence intervals (95% CI) per 100 person-years for YPHIV (enrolled into care <10 years of age) and YNPHIV (enrolled ≥10 years and <25 years). We estimate mortality IRRs in a negative binomial regression model, adjusted for sex, region time-varying age, CD4+ cell count at ART initiation (<350 cells/μl, ≥350 cells/μl, unknown), and time on ART (<12 and ≥12 months). Results: Overall, 104 846 adolescents and youth were included: 21 340 (20%) YPHIV (50% women) and 83 506 YNPHIV (80% women). Overall mortality incidence ratios were higher among YNPHIV (incidence ratio: 2.3/100 person-years; 95% CI: 2.2-2.4) compared with YPHIV (incidence ratio: 0.7/100 person-years; 95% CI: 0.7-0.8). Among adolescents aged 10-19 years, mortality was lower among YPHIV compared with YNPHIV (all IRRs <1, ranging from 0.26, 95% CI: 0.13-0.49 in 10-14-year-old boys in the Asia-Pacific to 0.51, 95% CI: 0.30-0.87 in 15-19-year-old boys in West Africa). Conclusion: We report substantial amount of deaths occurring during adolescence. Mortality was significantly higher among YNPHIV compared to YPHIV. Specific interventions including HIV testing and early engagement in care are urgently needed to improve survival among YNPHIV.Item AIscEA: unsupervised integration of single-cell gene expression and chromatin accessibility via their biological consistency(Oxford University Press, 2022) Jafari, Elham; Johnson, Travis; Wang, Yue; Liu, Yunlong; Huang, Kun; Wang, Yijie; Biostatistics and Health Data Science, School of MedicineMotivation: The integrative analysis of single-cell gene expression and chromatin accessibility measurements is essential for revealing gene regulation, but it is one of the key challenges in computational biology. Gene expression and chromatin accessibility are measurements from different modalities, and no common features can be directly used to guide integration. Current state-of-the-art methods lack practical solutions for finding heterogeneous clusters. However, previous methods might not generate reliable results when cluster heterogeneity exists. More importantly, current methods lack an effective way to select hyper-parameters under an unsupervised setting. Therefore, applying computational methods to integrate single-cell gene expression and chromatin accessibility measurements remains difficult. Results: We introduce AIscEA-Alignment-based Integration of single-cell gene Expression and chromatin Accessibility-a computational method that integrates single-cell gene expression and chromatin accessibility measurements using their biological consistency. AIscEA first defines a ranked similarity score to quantify the biological consistency between cell clusters across measurements. AIscEA then uses the ranked similarity score and a novel permutation test to identify cluster alignment across measurements. AIscEA further utilizes graph alignment for the aligned cell clusters to align the cells across measurements. We compared AIscEA with the competing methods on several benchmark datasets and demonstrated that AIscEA is highly robust to the choice of hyper-parameters and can better handle the cluster heterogeneity problem. Furthermore, AIscEA significantly outperforms the state-of-the-art methods when integrating real-world SNARE-seq and scMultiome-seq datasets in terms of integration accuracy. Availability and implementation: AIscEA is available at https://figshare.com/articles/software/AIscEA_zip/21291135 on FigShare as well as {https://github.com/elhaam/AIscEA} onGitHub.