In silico generation and augmentation of regulatory variants from massively parallel reporter assay using conditional variational autoencoder

Predicting the functional consequences of genetic variants in non-coding regions is a challenging problem. Massively parallel reporter assays (MPRAs), which are an in vitro high-throughput method, can simultaneously test thousands of variants by evaluating the existence of allele specific regulatory activity. Nevertheless, the identified labelled variants by MPRAs, which shows differential allelic regulatory effects on the gene expression are usually limited to the scale of hundreds, limiting their potential to be used as the training set for achieving a robust genome-wide prediction. To address the limitation, we propose a deep generative model, MpraVAE, to in silico generate and augment the training sample size of labelled variants. By benchmarking on several MPRA datasets, we demonstrate that MpraVAE significantly improves the prediction performance for MPRA regulatory variants compared to the baseline method, conventional data augmentation approaches as well as existing variant scoring methods. Taking autoimmune diseases as one example, we apply MpraVAE to perform a genome-wide prediction of regulatory variants and find that predicted regulatory variants are more enriched than background variants in enhancers, active histone marks, open chromatin regions in immune-related cell types, and chromatin states associated with promoter, enhancer activity and binding sites of cMyC and Pol II that regulate gene expression. Importantly, predicted regulatory variants are found to link immune-related genes by leveraging chromatin loop and accessible chromatin, demonstrating the importance of MpraVAE in genetic and gene discovery for complex traits.

Keywords

Massively parallel reporter assays (MPRAs), Genetic variants, Gene expression

Cite As

Jin W, Xia Y, Thela SR, Liu Y, Chen L. In silico generation and augmentation of regulatory variants from massively parallel reporter assay using conditional variational autoencoder. Preprint. bioRxiv. 2024;2024.06.25.600715. Published 2024 Jun 29. doi:10.1101/2024.06.25.600715

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Source

PMC

Type

Article

Permanent Link

https://hdl.handle.net/1805/43361

DOI

https://doi.org/10.1101/2024.06.25.600715

Version

Preprint

Collections

Open Access Policy Articles
Department of Medical and Molecular Genetics Works

Full item page