Exploiting deep transfer learning for the prediction of functional non-coding variants using genomic sequence

Motivation: Though genome-wide association studies have identified tens of thousands of variants associated with complex traits and most of them fall within the non-coding regions, they may not be the causal ones. The development of high-throughput functional assays leads to the discovery of experimental validated non-coding functional variants. However, these validated variants are rare due to technical difficulty and financial cost. The small sample size of validated variants makes it less reliable to develop a supervised machine learning model for achieving a whole genome-wide prediction of non-coding causal variants.

Results: We will exploit a deep transfer learning model, which is based on convolutional neural network, to improve the prediction for functional non-coding variants (NCVs). To address the challenge of small sample size, the transfer learning model leverages both large-scale generic functional NCVs to improve the learning of low-level features and context-specific functional NCVs to learn high-level features toward the context-specific prediction task. By evaluating the deep transfer learning model on three MPRA datasets and 16 GWAS datasets, we demonstrate that the proposed model outperforms deep learning models without pretraining or retraining. In addition, the deep transfer learning model outperforms 18 existing computational methods in both MPRA and GWAS datasets.

Availability and implementation: https://github.com/lichen-lab/TLVar.

Keywords

Genome-Wide Association Study, Genomics, Machine Learning, Computer Neural Networks

Cite As

Chen L, Wang Y, Zhao F. Exploiting deep transfer learning for the prediction of functional non-coding variants using genomic sequence. Bioinformatics. 2022;38(12):3164-3172. doi:10.1093/bioinformatics/btac214

Journal

Bioinformatics

Rights

Publisher Policy

Source

PMC

Type

Article

Permanent Link

https://hdl.handle.net/1805/36654

DOI

https://doi.org/10.1093/bioinformatics/btac214

Version

Final published version

Full Text Available at

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9890318/

Collections

Open Access Policy Articles
Biostatistics and Health Data Science Works

Full item page