HapCNV: A Comprehensive Framework for CNV Detection in Low-input DNA Sequencing Data

dc.contributor.authorYu, Xuanxuan
dc.contributor.authorQin, Fei
dc.contributor.authorLiu, Shiwei
dc.contributor.authorBrown, Noah J.
dc.contributor.authorLu, Qing
dc.contributor.authorCai, Guoshuai
dc.contributor.authorGuler, Jennifer L.
dc.contributor.authorXiao, Feifei
dc.contributor.departmentRadiology and Imaging Sciences, School of Medicine
dc.date.accessioned2025-02-24T16:56:38Z
dc.date.available2025-02-24T16:56:38Z
dc.date.issued2025-01-07
dc.description.abstractCopy number variants (CNVs) are prevalent in both diploid and haploid genomes, with the latter containing a single copy of each gene. Studying CNVs in genomes from single or few cells is significantly advancing our knowledge in human disorders and disease susceptibility. Low-input including low-cell and single-cell sequencing data for haploid and diploid organisms generally displays shallow and highly non-uniform read counts resulting from the whole genome amplification steps that introduce amplification biases. In addition, haploid organisms typically possess relatively short genomes and require a higher degree of DNA amplification compared to diploid organisms. However, most CNV detection methods are specifically developed for diploid genomes without specific consideration of effects on haploid genomes. Challenges also reside in reference samples or normal controls which are used to provide baseline signals for defining copy number losses or gains. In traditional methods, references are usually pre-specified from cells that are assumed to be normal or disease-free. However, the use of pre-defined reference cells can bias results if common CNVs are present. Here, we present the development of a comprehensive statistical framework for data normalization and CNV detection in haploid single- or low-cell DNA sequencing data called HapCNV. The prominent advancement is the construction of a novel genomic location specific pseudo-reference that selects unbiased references using a preliminary cell clustering method. This approach effectively preserves common CNVs. Using simulations, we demonstrated that HapCNV outperformed existing methods by generating more accurate CNV detection, especially for short CNVs. Superior performance of HapCNV was also validated in detecting known CNVs in a real P. falciparum parasite dataset. In conclusion, HapCNV provides a novel and useful approach for CNV detection in haploid low-input sequencing datasets, with easy applicability to diploids.
dc.eprint.versionPreprint
dc.identifier.citationYu X, Qin F, Liu S, et al. HapCNV: A Comprehensive Framework for CNV Detection in Low-input DNA Sequencing Data. Preprint. bioRxiv. 2025;2024.12.19.629494. Published 2025 Jan 7. doi:10.1101/2024.12.19.629494
dc.identifier.urihttps://hdl.handle.net/1805/45982
dc.language.isoen_US
dc.publisherbioRxiv
dc.relation.isversionof10.1101/2024.12.19.629494
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internationalen
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0
dc.sourcePMC
dc.subjectSingle-cell DNA sequencing
dc.subjectLow-input sequencing
dc.subjectCopy number variation
dc.subjectHaploid
dc.subjectPseudo-reference sequence
dc.titleHapCNV: A Comprehensive Framework for CNV Detection in Low-input DNA Sequencing Data
dc.typeArticle
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Yu2025HapCNV-CCBYNCND.pdf
Size:
2.97 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.04 KB
Format:
Item-specific license agreed upon to submission
Description: