- Browse by Author
Browsing by Author "Long, Qi"
Now showing 1 - 9 of 9
Results Per Page
Sort Options
Item An integrative latent class model of heterogeneous data modalities for diagnosing kidney obstruction(Oxford University Press, 2024) Jang, Jeong Hoon; Chang, Changgee; Manatunga, Amita K.; Taylor, Andrew T.; Long, Qi; Biostatistics and Health Data Science, Richard M. Fairbanks School of Public HealthRadionuclide imaging plays a critical role in the diagnosis and management of kidney obstruction. However, most practicing radiologists in US hospitals have insufficient time and resources to acquire training and experience needed to interpret radionuclide images, leading to increased diagnostic errors. To tackle this problem, Emory University embarked on a study that aims to develop a computer-assisted diagnostic (CAD) tool for kidney obstruction by mining and analyzing patient data comprised of renogram curves, ordinal expert ratings on the obstruction status, pharmacokinetic variables, and demographic information. The major challenges here are the heterogeneity in data modes and the lack of gold standard for determining kidney obstruction. In this article, we develop a statistically principled CAD tool based on an integrative latent class model that leverages heterogeneous data modalities available for each patient to provide accurate prediction of kidney obstruction. Our integrative model consists of three sub-models (multilevel functional latent factor regression model, probit scalar-on-function regression model, and Gaussian mixture model), each of which is tailored to the specific data mode and depends on the unknown obstruction status (latent class). An efficient MCMC algorithm is developed to train the model and predict kidney obstruction with associated uncertainty. Extensive simulations are conducted to evaluate the performance of the proposed method. An application to an Emory renal study demonstrates the usefulness of our model as a CAD tool for kidney obstruction.Item A Bayesian multiple imputation approach to bivariate functional data with missing components(Wiley, 2021) Jang, Jeong Hoon; Manatunga, Amita K.; Chang, Changgee; Long, Qi; Biostatistics and Health Data Science, School of MedicineExisting missing data methods for functional data mainly focus on reconstructing missing measurements along a single function-a univariate functional data setting. Motivated by a renal study, we focus on a bivariate functional data setting, where each sampling unit is a collection of two distinct component functions, one of which may be missing. Specifically, we propose a Bayesian multiple imputation approach based on a bivariate functional latent factor model that exploits the joint changing patterns of the component functions to allow accurate and stable imputation of one component given the other. We further extend the framework to address multilevel bivariate functional data with missing components by modeling and exploiting inter-component and intra-subject correlations. We develop a Gibbs sampling algorithm that simultaneously generates multiple imputations of missing component functions and posterior samples of model parameters. For multilevel bivariate functional data, a partially collapsed Gibbs sampler is implemented to improve computational efficiency. Our simulation study demonstrates that our methods outperform other competing methods for imputing missing components of bivariate functional data under various designs and missingness rates. The motivating renal study aims to investigate the distribution and pharmacokinetic properties of baseline and post-furosemide renogram curves that provide further insights into the underlying mechanism of renal obstruction, with post-furosemide renogram curves missing for some subjects. We apply the proposed methods to impute missing post-furosemide renogram curves and obtain more refined insights.Item Brain-wide genome-wide colocalization study for integrating genetics, transcriptomics and brain morphometry in Alzheimer’s disease(Elsevier, 2023) Bao, Jingxuan; Wen, Junhao; Wen, Zixuan; Yang, Shu; Cui, Yuhan; Yang, Zhijian; Erus, Guray; Saykin, Andrew J.; Long, Qi; Davatzikos, Christos; Shen, Li; Radiology and Imaging Sciences, School of MedicineAlzheimer’s disease (AD) is one of the most common neurodegenerative diseases. However, the AD mechanism has not yet been fully elucidated to date, hindering the development of effective therapies. In our work, we perform a brain imaging genomics study to link genetics, single-cell gene expression data, tissue-specific gene expression data, brain imaging-derived volumetric endophenotypes, and disease diagnosis to discover potential underlying neurobiological pathways for AD. To do so, we perform brain-wide genome-wide colocalization analyses to integrate multidimensional imaging genomic biobank data. Specifically, we use (1) the individual-level imputed genotyping data and magnetic resonance imaging (MRI) data from the UK Biobank, (2) the summary statistics of the genome-wide association study (GWAS) from multiple European ancestry cohorts, and (3) the tissue-specific cis-expression quantitative trait loci (cis-eQTL) summary statistics from the GTEx project. We apply a Bayes factor colocalization framework and mediation analysis to these multi-modal imaging genomic data. As a result, we derive the brain regional level GWAS summary statistics for 145 brain regions with 482,831 single nucleotide polymorphisms (SNPs) followed by posthoc functional annotations. Our analysis yields the discovery of a potential AD causal pathway from a systems biology perspective: the SNP chr10:124165615:G>A (rs6585827) mutation upregulates the expression of BTBD16 gene in oligodendrocytes, a specialized glial cells, in the brain cortex, leading to a reduced risk of volumetric loss in the entorhinal cortex, resulting in the protective effect on AD. We substantiate our findings with multiple evidence from existing imaging, genetic and genomic studies in AD literature. Our study connects genetics, molecular and cellular signatures, regional brain morphologic endophenotypes, and AD diagnosis, providing new insights into the mechanistic understanding of the disease. Our findings can provide valuable guidance for subsequent therapeutic target identification and drug discovery in AD.Item Integrative analysis of multi-omics and imaging data with incorporation of biological information via structural Bayesian factor analysis(Oxford University Press, 2023) Bao, Jingxuan; Chang, Changgee; Zhang, Qiyiwen; Saykin, Andrew J.; Shen, Li; Long, Qi; Alzheimer’s Disease Neuroimaging Initiative; Radiology and Imaging Sciences, School of MedicineMotivation: With the rapid development of modern technologies, massive data are available for the systematic study of Alzheimer's disease (AD). Though many existing AD studies mainly focus on single-modality omics data, multi-omics datasets can provide a more comprehensive understanding of AD. To bridge this gap, we proposed a novel structural Bayesian factor analysis framework (SBFA) to extract the information shared by multi-omics data through the aggregation of genotyping data, gene expression data, neuroimaging phenotypes and prior biological network knowledge. Our approach can extract common information shared by different modalities and encourage biologically related features to be selected, guiding future AD research in a biologically meaningful way. Method: Our SBFA model decomposes the mean parameters of the data into a sparse factor loading matrix and a factor matrix, where the factor matrix represents the common information extracted from multi-omics and imaging data. Our framework is designed to incorporate prior biological network information. Our simulation study demonstrated that our proposed SBFA framework could achieve the best performance compared with the other state-of-the-art factor-analysis-based integrative analysis methods. Results: We apply our proposed SBFA model together with several state-of-the-art factor analysis models to extract the latent common information from genotyping, gene expression and brain imaging data simultaneously from the ADNI biobank database. The latent information is then used to predict the functional activities questionnaire score, an important measurement for diagnosis of AD quantifying subjects' abilities in daily life. Our SBFA model shows the best prediction performance compared with the other factor analysis models. Availability: Code are publicly available at https://github.com/JingxuanBao/SBFA.Item Multi-task learning based structured sparse canonical correlation analysis for brain imaging genetics(Elsevier, 2022-02) Kim, Mansu; Min, Eun Jeong; Liu, Kefei; Yan, Jingwen; Saykin, Andrew J.; Moore, Jason H.; Long, Qi; Shen, Li; Biomedical Engineering and Informatics, Luddy School of Informatics, Computing, and EngineeringThe advances in technologies for acquiring brain imaging and high-throughput genetic data allow the researcher to access a large amount of multi-modal data. Although the sparse canonical correlation analysis is a powerful bi-multivariate association analysis technique for feature selection, we are still facing major challenges in integrating multi-modal imaging genetic data and yielding biologically meaningful interpretation of imaging genetic findings. In this study, we propose a novel multi-task learning based structured sparse canonical correlation analysis (MTS2CCA) to deliver interpretable results and improve integration in imaging genetics studies. We perform comparative studies with state-of-the-art competing methods on both simulation and real imaging genetic data. On the simulation data, our proposed model has achieved the best performance in terms of canonical correlation coefficients, estimation accuracy, and feature selection accuracy. On the real imaging genetic data, our proposed model has revealed promising features of single-nucleotide polymorphisms and brain regions related to sleep. The identified features can be used to improve clinical score prediction using promising imaging genetic biomarkers. An interesting future direction is to apply our model to additional neurological or psychiatric cohorts such as patients with Alzheimer’s or Parkinson’s disease to demonstrate the generalizability of our method.Item Polygenic mediation analysis of Alzheimer's disease implicated intermediate amyloid imaging phenotypes(American Medical Informatics Association, 2021-01-25) Eng, Yingxuan; Yao, Xiaohui; Liu, Kefei; Risacher, Shannon L.; Saykin, Andrew J.; Long, Qi; Zhao, Yize; Shen, Li; Radiology and Imaging Sciences, School of MedicineMediation models have been employed in the study of brain disorders to detect the underlying mechanisms between genetic variants and diagnostic outcomes implicitly mediated by intermediate imaging biomarkers. However, the statistical power is influenced by the modest effects of individual genetic variants on both diagnostic and imaging phenotypes and the limited sample sizes ofimaging genetic cohorts. In this study, we propose a polygenic mediation analysis that comprises a polygenic risk score (PRS) to aggregate genetic effects ofa set ofcandidate variants and then explore the implicit effect ofimaging phenotypes between the PRS and disease status. We applied our proposed method to an amyloid imaging genetic study of Alzheimer's disease (AD), identified multiple imaging mediators linking PRS with AD, and further demonstrated the promise of the PRS on mediator detection over individual variants alone.Item Preference Matrix Guided Sparse Canonical Correlation Analysis for Genetic Study of Quantitative Traits in Alzheimer’s Disease(IEEE, 2022-12) Sha, Jiahang; Bao, Jingxuan; Liu, Kefei; Yang, Shu; Wen, Zixuan; Cui, Yuhan; Wen, Junhao; Davatzikos, Christos; Moore, Jason H.; Saykin, Andrew J.; Long, Qi; Shen, Li; Radiology and Imaging Sciences, School of MedicineInvestigating the relationship between genetic variation and phenotypic traits is a key issue in quantitative genetics. Specifically for Alzheimer’s disease, the association between genetic markers and quantitative traits remains vague while, once identified, will provide valuable guidance for the study and development of genetic-based treatment approaches. Currently, to analyze the association of two modalities, sparse canonical correlation analysis (SCCA) is commonly used to compute one sparse linear combination of the variable features for each modality, giving a pair of linear combination vectors in total that maximizes the cross-correlation between the analyzed modalities. One drawback of the plain SCCA model is that the existing findings and knowledge cannot be integrated into the model as priors to help extract interesting correlation as well as identify biologically meaningful genetic and phenotypic markers. To bridge this gap, we introduce preference matrix guided SCCA (PM-SCCA) that not only takes priors encoded as a preference matrix but also maintains computational simplicity. A simulation study and a real-data experiment are conducted to investigate the effectiveness of the model. Both experiments demonstrate that the proposed PM-SCCA model can capture not only genotype-phenotype correlation but also relevant features effectively.Item Preference Matrix Guided Sparse Canonical Correlation Analysis for Mining Brain Imaging Genetic Associations in Alzheimer’s Disease(Elsevier, 2023) Sha, Jiahang; Bao, Jingxuan; Liu, Kefei; Yang, Shu; Wen, Zixuan; Wen, Junhao; Cui, Yuhan; Tong, Boning; Moore, Jason H.; Saykin, Andrew J.; Davatzikos, Christos; Long, Qi; Shen, Li; Alzheimer’s Disease Neuroimaging Initiative; Radiology and Imaging Sciences, School of MedicineInvestigating the relationship between genetic variation and phenotypic traits is a key issue in quantitative genetics. Specifically for Alzheimer's disease, the association between genetic markers and quantitative traits remains vague while, once identified, will provide valuable guidance for the study and development of genetics-based treatment approaches. Currently, to analyze the association of two modalities, sparse canonical correlation analysis (SCCA) is commonly used to compute one sparse linear combination of the variable features for each modality, giving a pair of linear combination vectors in total that maximizes the cross-correlation between the analyzed modalities. One drawback of the plain SCCA model is that the existing findings and knowledge cannot be integrated into the model as priors to help extract interesting correlations as well as identify biologically meaningful genetic and phenotypic markers. To bridge this gap, we introduce preference matrix guided SCCA (PM-SCCA) that not only takes priors encoded as a preference matrix but also maintains computational simplicity. A simulation study and a real-data experiment are conducted to investigate the effectiveness of the model. Both experiments demonstrate that the proposed PM-SCCA model can capture not only genotype-phenotype correlation but also relevant features effectively.Item Robust knowledge-guided biclustering for multi-omics data(Oxford University Press, 2023) Zhang, Qiyiwen; Chang, Changgee; Long, Qi; Biostatistics and Health Data Science, Richard M. Fairbanks School of Public HealthBiclustering is a useful method for simultaneously grouping samples and features and has been applied across various biomedical data types. However, most existing biclustering methods lack the ability to integratively analyze multi-modal data such as multi-omics data such as genome, transcriptome and epigenome. Moreover, the potential of leveraging biological knowledge represented by graphs, which has been demonstrated to be beneficial in various statistical tasks such as variable selection and prediction, remains largely untapped in the context of biclustering. To address both, we propose a novel Bayesian biclustering method called Bayesian graph-guided biclustering (BGB). Specifically, we introduce a new hierarchical sparsity-inducing prior to effectively incorporate biological graph information and establish a unified framework to model multi-view data. We develop an efficient Markov chain Monte Carlo algorithm to conduct posterior sampling and inference. Extensive simulations and real data analysis show that BGB outperforms other popular biclustering methods. Notably, BGB is robust in terms of utilizing biological knowledge and has the capability to reveal biologically meaningful information from heterogeneous multi-modal data.