Biostatistics and Health Data Science Works

Permanent URI for this collection

Works authored by scholars from the Department of Biostatistics and Health Data Science, a dual department of the Richard M. Fairbanks School of Public Health and the IU School of Medicine.


Recent Submissions

Now showing 1 - 10 of 418
  • Item
    γ-Aminobutyric acids (GABA) and serum GABA/AABA (G/A) ratio as potential biomarkers of physical performance and aging
    (Springer Nature, 2023-10-10) Lyssikatos, Charalampos; Wang, Zhiying; Liu, Ziyue; Warden, Stuart J.; Bonewald, Lynda; Brotto, Marco; Biostatistics and Health Data Science, School of Medicine
    Declining physical performance with age and disease is an important indicator of declining health. Biomarkers that identify declining physical performance would be useful in predicting treatment outcomes and identifying potential therapeutics. γ-aminobutyric acid (GABA), a muscle autocrine factor, is a potent inhibitor of muscle function and works as a muscle relaxant. L-α-aminobutyric acid (L-AABA) is a biomarker for malnutrition, liver damage, and depression. We sought to determine if GABA and L-AABA may be useful for predicting physical performance. Serum levels of GABA and L-AABA were quantified in 120 individuals divided by age, sex, and physical capacity into low, average, and high performer groups. Analyses explored correlations between serum levels and physical performance. Both GABA and the ratio of GABA/AABA (G/A), but not AABA, were highly positively associated with age (Pearson correlations r = 0.35, p = 0.0001 for GABA, r = 0.31, p = 0.0007 for G/A, n = 120). GABA showed negative associations in the whole cohort with physical performance [fast gait speed, 6 min walk test (6MWT), PROMIS score, and SF36PFS raw score] and with subtotal and femoral neck bone mineral density. L-AABA was positively associated with usual gait speed, 6MWT, total SPPB score, and SF36PFS raw score in the total cohort of 120 human subjects, also with 6MWT and SF36PFS raw score in the 60 male subjects, but no associations were observed in the 60 females. As both GABA and L-AABA appear to be indicative of physical performance, but in opposite directions, we examined the G/A ratio. Unlike GABA, the G/A ratio showed a more distinct association with mobility tests such as total SPPB score, usual and fast gait speed, 6MWT, and SF36PFS raw score in the males, regardless of age and metabolic status. Serum G/A ratio could be potentially linked to physical performance in the male population. Our findings strongly suggest that GABA, L-AABA, and the G/A ratio in human serum may be useful markers for both age and physical function. These new biomarkers may significantly enhance the goal of identifying universal biomarkers to accurately predict physical performance and the beneficial effects of exercise training for older adults.
  • Item
    SAT074 Induction Of Insulin Hypersecretion Uncovers Distinctions Between Adaptive And Maladaptive Endoplasmic Reticulum Stress Response In Beta Cells
    (The Endocrine Society, 2023-10-05) Roy, Gitanjali; Rodrigues dos Santos, Karina; Kwakye, Michael B.; Tan, Zhiyong; Johnson, Travis S.; Kalwat, Michael A.; Biostatistics and Health Data Science, School of Medicine
    Pancreatic islet β-cells release insulin to maintain glucose homeostasis. β-cells must translate, package, and secrete large amounts of insulin. During this process the unfolded protein response of the endoplasmic reticulum (UPRER) is induced to maintain these functions. However, stimuli that force β-cell to secrete insulin at enhanced rates and for prolonged durations risk inducing the terminal UPRER and eventual apoptosis. In a chemical screen for insulin secretion modulators, we discovered SW016789 which caused hypersecretion of insulin and led to a transient induction of the UPRER, but not apoptosis. In contrast, SERCA2 ER Ca2+ pump inhibitor thapsigargin induces the terminal UPRER. We hypothesized that SW016789 can be used as a tool compound to discover genes involved in β-cell adaptation to hypersecretion-induced stress. We performed time course transcriptomics in MIN6 β-cells exposed to SW016789 (5 µM) or thapsigargin (100 nM) from 0-24 h. Unbiased analyses using a Dirichlet process Gaussian process (DPGP) method revealed clusters of genes temporally co-regulated and the genes within these clusters were distinct between SW016789 and thapsigargin treatments. In particular, after 6 h of SW016789-induced hypersecretion we found a highly induced cluster of genes (SW cluster 3) enriched in adaptive UPRER factors (e.g. Manf). Conversely, most of the thapsigargin-induced genes clustered at 24 h and were enriched for terminal UPRER factors (e.g. Txnip). Pathway analysis of SW cluster 3 indicated that genes involved in in regulation of mRNA methylation and ER-associated degradation were also induced by SW016789 sooner and with greater amplitude than by thapsigargin, suggesting distinct differences in the handling of protein translation and degradation. From the SW cluster 3 genes we selected proteins known to be ER-associated or secreted and generated stable transgenic or CRISPR knockout MIN6 β-cell lines for each. Our data suggest altered expression of these factors may impair glucose-stimulated insulin secretion responses and alter cell viability in presence or absence of ER stressors including cytokines, thapsigargin, and tunicamycin. In conclusion, we have successfully shown that pharmacological induction of insulin hypersecretion can induce a distinct transcriptional outcome from that of canonically-induced UPRER and that we can take advantage of this property to discover new β-cell regulatory pathways and targets. We envision this dataset as a resource for the secretory biology and islet biology communities.
  • Item
    L-β-aminoisobutyric acid, L-BAIBA, a marker of bone mineral density and body mass index, and D-BAIBA of physical performance and age
    (Springer Nature, 2023-10-11) Lyssikatos, Charalampos; Wang, Zhiying; Liu, Ziyue; Warden, Stuart J.; Brotto, Marco; Bonewald, Lynda; Biostatistics and Health Data Science, School of Medicine
    As both L- and D-BAIBA are increased with exercise, we sought to determine if circulating levels would be associated with physical performance. Serum levels of L- and D-BAIBA were quantified in 120 individuals (50% female) aged 20-85 years and categorized as either a "low" (LP), "average" (AP) or "high" performing (HP). Association analysis was performed using Spearman (S) and Pearson (P) correlation. Using Spearman correlation, L-BAIBA positively associated with (1) body mass index BMI (0.23) and total fat mass (0.19) in the 120 participants, (2) total fat mass in the 60 males (0.26), and (3) bone mineral density, BMD, (0.28) in addition to BMI (0.26) in the 60 females. In HP females, L-BAIBA positively associated with BMD (0.50) and lean mass (0.47). D-BAIBA was positively associated with (1) age (P 0.20) in the 120 participants, (2) age (P 0.49) in the LP females and (3) with gait speed (S 0.20) in the 120 participants. However, in HP males, this enantiomer had a negative association with appendicular lean/height (S - 0.52) and in the AP males a negative correlation with BMD (S - 0.47). No associations were observed in HP or AP females, whereas, in LP females, a positive association was observed with grip strength (S 0.45), but a negative with BMD (P - 0.52, S - 0.63) and chair stands (P - 0.47, S - 0.51). L-BAIBA may play a role in BMI and BMD in females, not males, whereas D-BAIBA may be a marker for aging and physical performance. The association of L-BAIBA with BMI and fat mass may reveal novel, not previously described functions for this enantiomer.
  • Item
    AIscEA: unsupervised integration of single-cell gene expression and chromatin accessibility via their biological consistency
    (Oxford University Press, 2022) Jafari, Elham; Johnson, Travis; Wang, Yue; Liu, Yunlong; Huang, Kun; Wang, Yijie; Biostatistics and Health Data Science, School of Medicine
    Motivation: The integrative analysis of single-cell gene expression and chromatin accessibility measurements is essential for revealing gene regulation, but it is one of the key challenges in computational biology. Gene expression and chromatin accessibility are measurements from different modalities, and no common features can be directly used to guide integration. Current state-of-the-art methods lack practical solutions for finding heterogeneous clusters. However, previous methods might not generate reliable results when cluster heterogeneity exists. More importantly, current methods lack an effective way to select hyper-parameters under an unsupervised setting. Therefore, applying computational methods to integrate single-cell gene expression and chromatin accessibility measurements remains difficult. Results: We introduce AIscEA-Alignment-based Integration of single-cell gene Expression and chromatin Accessibility-a computational method that integrates single-cell gene expression and chromatin accessibility measurements using their biological consistency. AIscEA first defines a ranked similarity score to quantify the biological consistency between cell clusters across measurements. AIscEA then uses the ranked similarity score and a novel permutation test to identify cluster alignment across measurements. AIscEA further utilizes graph alignment for the aligned cell clusters to align the cells across measurements. We compared AIscEA with the competing methods on several benchmark datasets and demonstrated that AIscEA is highly robust to the choice of hyper-parameters and can better handle the cluster heterogeneity problem. Furthermore, AIscEA significantly outperforms the state-of-the-art methods when integrating real-world SNARE-seq and scMultiome-seq datasets in terms of integration accuracy. Availability and implementation: AIscEA is available at on FigShare as well as {} onGitHub.
  • Item
    Age-specific mortality rate ratios in adolescents and youth aged 10–24 years living with perinatally versus nonperinatally acquired HIV
    (Wolters Kluwer, 2021) Desmonde, Sophie; Ciaranello, Andrea L.; Malateste, Karen; Musick, Beverly; Patten, Gabriela; Thien Vu, An; Edmonds, Andrew; Neilan, Anne M.; Duda, Stephany N.; Wools-Kaloustian, Kara; Davies, Mary-Ann; Leroy, Valériane; Biostatistics and Health Data Science, School of Medicine
    Objective: To measure mortality incidence rates and incidence rate ratios (IRR) in adolescents and youth living with perinatally acquired HIV (YPHIV) compared with those living with nonperinatally acquired HIV (YNPHIV), by region, by sex, and during the ages of 10-14, 15-19, and 20-24 years in IeDEA. Design and methods: All those with a confirmed HIV diagnosis, antiretroviral therapy (ART)-naive at enrollment, and who have post-ART follow-up while aged 10-24 years between 2004 and 2016 were included. We estimated post-ART mortality incidence rates and 95% confidence intervals (95% CI) per 100 person-years for YPHIV (enrolled into care <10 years of age) and YNPHIV (enrolled ≥10 years and <25 years). We estimate mortality IRRs in a negative binomial regression model, adjusted for sex, region time-varying age, CD4+ cell count at ART initiation (<350 cells/μl, ≥350 cells/μl, unknown), and time on ART (<12 and ≥12 months). Results: Overall, 104 846 adolescents and youth were included: 21 340 (20%) YPHIV (50% women) and 83 506 YNPHIV (80% women). Overall mortality incidence ratios were higher among YNPHIV (incidence ratio: 2.3/100 person-years; 95% CI: 2.2-2.4) compared with YPHIV (incidence ratio: 0.7/100 person-years; 95% CI: 0.7-0.8). Among adolescents aged 10-19 years, mortality was lower among YPHIV compared with YNPHIV (all IRRs <1, ranging from 0.26, 95% CI: 0.13-0.49 in 10-14-year-old boys in the Asia-Pacific to 0.51, 95% CI: 0.30-0.87 in 15-19-year-old boys in West Africa). Conclusion: We report substantial amount of deaths occurring during adolescence. Mortality was significantly higher among YNPHIV compared to YPHIV. Specific interventions including HIV testing and early engagement in care are urgently needed to improve survival among YNPHIV.
  • Item
    Markov Additive Processes for Degradation with Jumps under Dynamic Environments
    (National Science Foundation, 2021) Shu, Yin; Feng, Qianmei; Kao, Edward P. C.; Coit, David W.; Liu, Hao; Biostatistics and Health Data Science, School of Medicine
    We use general Markov additive processes (Markov modulated Lévy processes) to integrally handle the complexity of degradation including internally- and externally-induced stochastic properties with complex jump mechanisms. The background component of the Markov additive process is a Markov chain defined on a finite state space; the additive component evolves as a Lévy subordinator under a certain background state, and may have instantaneous nonnegative jumps occurring at the time the background state switches. We derive the Fokker-Planck equations for such Markov modulated processes, based on which we derive Laplace expressions for reliability function and lifetime moments, represented by the infinitesimal generator matrices of Markov chain and the Lévy measure of Lévy subordinator. The superiority of our models is their flexibility in modeling degradation data with jumps under dynamic environments. Numerical experiments are used to demonstrate that our general models perform well.
  • Item
    Identifying brain hierarchical structures associated with Alzheimer’s disease using a regularized regression method with tree predictors
    (Oxford University Press, 2023) Zhao, Yi; Wang, Bingkai; Liu, Chin-Fu; Faria, Andreia V.; Miller, Michael I.; Caffo, Brian S.; Luo, Xi; Biostatistics and Health Data Science, School of Medicine
    Brain segmentation at different levels is generally represented as hierarchical trees. Brain regional atrophy at specific levels was found to be marginally associated with Alzheimer’s disease outcomes. In this study, we propose an ℓ1-type regularization for predictors that follow a hierarchical tree structure. Considering a tree as a directed acyclic graph, we interpret the model parameters from a path analysis perspective. Under this concept, the proposed penalty regulates the total effect of each predictor on the outcome. With regularity conditions, it is shown that under the proposed regularization, the estimator of the model coefficient is consistent in ℓ2-norm and the model selection is also consistent. When applied to a brain sMRI dataset acquired from the Alzheimer’s Disease Neuroimaging Initiative (ADNI), the proposed approach identifies brain regions where atrophy in these regions demonstrates the declination in memory. With regularization on the total effects, the findings suggest that the impact of atrophy on memory deficits is localized from small brain regions, but at various levels of brain segmentation. Data used in preparation of this paper were obtained from the ADNI database.
  • Item
    Whole Genome Sequencing Analysis of Body Mass Index Identifies Novel African Ancestry-Specific Risk Allele
    (medRxiv, 2023-08-22) Zhang, Xinruo; Brody, Jennifer A.; Graff, Mariaelisa; Highland, Heather M.; Chami, Nathalie; Xu, Hanfei; Wang, Zhe; Ferrier, Kendra; Chittoor, Geetha; Josyula, Navya S.; Li, Xihao; Li, Zilin; Allison, Matthew A.; Becker, Diane M.; Bielak, Lawrence F.; Bis, Joshua C.; Boorgula, Meher Preethi; Bowden, Donald W.; Broome, Jai G.; Buth, Erin J.; Carlson, Christopher S.; Chang, Kyong-Mi; Chavan, Sameer; Chiu, Yen-Feng; Chuang, Lee-Ming; Conomos, Matthew P.; DeMeo, Dawn L.; Du, Margaret; Duggirala, Ravindranath; Eng, Celeste; Fohner, Alison E.; Freedman, Barry I.; Garrett, Melanie E.; Guo, Xiuqing; Haiman, Chris; Heavner, Benjamin D.; Hidalgo, Bertha; Hixson, James E.; Ho, Yuk-Lam; Hobbs, Brian D.; Hu, Donglei; Hui, Qin; Hwu, Chii-Min; Jackson, Rebecca D.; Jain, Deepti; Kalyani, Rita R.; Kardia, Sharon L. R.; Kelly, Tanika N.; Lange, Ethan M.; LeNoir, Michael; Li, Changwei; Marchand, Loic Le; McDonald, Merry-Lynn N.; McHugh, Caitlin P.; Morrison, Alanna C.; Naseri, Take; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium; O'Connell, Jeffrey; O'Donnell, Christopher J.; Palmer, Nicholette D.; Pankow, James S.; Perry, James A.; Peters, Ulrike; Preuss, Michael H.; Rao, D. C.; Regan, Elizabeth A.; Reupena, Sefuiva M.; Roden, Dan M.; Rodriguez-Santana, Jose; Sitlani, Colleen M.; Smith, Jennifer A.; Tiwari, Hemant K.; Vasan, Ramachandran S.; Wang, Zeyuan; Weeks, Daniel E.; Wessel, Jennifer; Wiggins, Kerri L.; Wilkens, Lynne R.; Wilson, Peter W. F.; Yanek, Lisa R.; Yoneda, Zachary T.; Zhao, Wei; Zöllner, Sebastian; Arnett, Donna K.; Ashley-Koch, Allison E.; Barnes, Kathleen C.; Blangero, John; Boerwinkle, Eric; Burchard, Esteban G.; Carson, April P.; Chasman, Daniel I.; Chen, Yii-Der Ida; Curran, Joanne E.; Fornage, Myriam; Gordeuk, Victor R.; He, Jiang; Heckbert, Susan R.; Hou, Lifang; Irvin, Marguerite R.; Kooperberg, Charles; Minster, Ryan L.; Mitchell, Braxton D.; Nouraie, Mehdi; Psaty, Bruce M.; Raffield, Laura M.; Reiner, Alexander P.; Rich, Stephen S.; Rotter, Jerome I.; Shoemaker, M. Benjamin; Smith, Nicholas L.; Taylor, Kent D.; Telen, Marilyn J.; Weiss, Scott T.; Zhang, Yingze; Heard-Costa, Nancy; Sun, Yan V.; Lin, Xihong; Cupples, L. Adrienne; Lange, Leslie A.; Liu, Ching-Ti; Loos, Ruth J. F.; North, Kari E.; Justice, Anne E.; Biostatistics and Health Data Science, School of Medicine
    Obesity is a major public health crisis associated with high mortality rates. Previous genome-wide association studies (GWAS) investigating body mass index (BMI) have largely relied on imputed data from European individuals. This study leveraged whole-genome sequencing (WGS) data from 88,873 participants from the Trans-Omics for Precision Medicine (TOPMed) Program, of which 51% were of non-European population groups. We discovered 18 BMI-associated signals (P < 5 × 10−9). Notably, we identified and replicated a novel low frequency single nucleotide polymorphism (SNP) in MTMR3 that was common in individuals of African descent. Using a diverse study population, we further identified two novel secondary signals in known BMI loci and pinpointed two likely causal variants in the POC5 and DMD loci. Our work demonstrates the benefits of combining WGS and diverse cohorts in expanding current catalog of variants and genes confer risk for obesity, bringing us one step closer to personalized medicine.
  • Item
    Site-Level Comprehensiveness of Care Is Associated with Individual Clinical Retention Among Adults Living with HIV in International Epidemiology Databases to Evaluate AIDS, a Global HIV Cohort Collaboration, 2000-2016
    (Mary Ann Liebert, 2022) Wada, Paul Y.; Kim, Ahra; Jayathilake, Karu; Duda, Stephany N.; Abo, Yao; Althoff, Keri N.; Cornell, Morna; Musick, Beverly; Brown, Steve; Sohn, Annette H.; Chan, Yu Jiun; Wools-Kaloustian, Kara K.; Nash, Denis; Yiannoutsos, Constantin T.; Cesar, Carina; McGowan, Catherine C.; Rebeiro, Peter F.; Biostatistics and Health Data Science, School of Medicine
    Retention in care (RIC) reduces HIV transmission and associated morbidity and mortality. We examined whether delivery of comprehensive services influenced individual RIC within the International epidemiology Databases to Evaluate AIDS (IeDEA) network. We collected site data through IeDEA assessments 1.0 (2000–2009) and 2.0 (2010–2016). Each site received a comprehensiveness score for service availability (1 = present, 0 = absent), with tallies ranging from 0 to 7. We obtained individual-level cohort data for adults with at least one visit from 2000 to 2016 at sites responding to either assessment. Person-time was recorded annually, with RIC defined as completing two visits at least 90 days apart in each calendar year. Multivariable modified Poisson regression clustered by site yielded risk ratios and predicted probabilities for individual RIC by comprehensiveness. Among 347,060 individuals in care at 122 sites with 1,619,558 person-years of follow-up, 69.8% of person-time was retained in care, varying by region from 53.8% (Asia-Pacific) to 82.7% (East Africa); RIC improved by about 2% per year from 2000 to 2016 (p = 0.012). Every site provided CD4+ count testing, and >90% of individuals received care at sites that provided combination antiretroviral therapy adherence measures, prevention of mother-to-child transmission, tuberculosis screening, HIV-related prevention, and community tracing services. In adjusted models, individuals at sites with more comprehensive services had higher probabilities of RIC (0.71, 0.74, and 0.83 for scores 5, 6, and 7, respectively; p = 0.019). Within IeDEA, greater site-level comprehensiveness of services was associated with improved individual RIC. Much work remains in exploring this relationship, which may inform HIV clinical practice and health systems planning.
  • Item
    SiGra: single-cell spatial elucidation through an image-augmented graph transformer
    (Springer Nature, 2023-09-12) Tang, Ziyang; Li, Zuotian; Hou, Tieying; Zhang, Tonglin; Yang, Baijian; Su, Jing; Song, Qianqian; Biostatistics and Health Data Science, School of Medicine
    Recent advances in high-throughput molecular imaging have pushed spatial transcriptomics technologies to subcellular resolution, which surpasses the limitations of both single-cell RNA-seq and array-based spatial profiling. The multichannel immunohistochemistry images in such data provide rich information on the cell types, functions, and morphologies of cellular compartments. In this work, we developed a method, single-cell spatial elucidation through image-augmented Graph transformer (SiGra), to leverage such imaging information for revealing spatial domains and enhancing substantially sparse and noisy transcriptomics data. SiGra applies hybrid graph transformers over a single-cell spatial graph. SiGra outperforms state-of-the-art methods on both single-cell and spot-level spatial transcriptomics data from complex tissues. The inclusion of immunohistochemistry images improves the model performance by 37% (95% CI: 27-50%). SiGra improves the characterization of intratumor heterogeneity and intercellular communication and recovers the known microscopic anatomy. Overall, SiGra effectively integrates different spatial modality data to gain deep insights into spatial cellular ecosystems.