Nm-Nano: a machine learning framework for transcriptome-wide single-molecule mapping of 2´-O-methylation (Nm) sites in nanopore direct RNA sequencing datasets

dc.contributor.authorHassan, Doaa
dc.contributor.authorAriyur, Aditya
dc.contributor.authorDaulatabad, Swapna Vidhur
dc.contributor.authorMir, Quoseena
dc.contributor.authorJanga, Sarath Chandra
dc.contributor.departmentBioHealth Informatics, School of Informatics and Computing
dc.date.accessioned2024-07-31T10:42:44Z
dc.date.available2024-07-31T10:42:44Z
dc.date.issued2024
dc.description.abstract2´-O-methylation (Nm) is one of the most abundant modifications found in both mRNAs and noncoding RNAs. It contributes to many biological processes, such as the normal functioning of tRNA, the protection of mRNA against degradation by the decapping and exoribonuclease (DXO) protein, and the biogenesis and specificity of rRNA. Recent advancements in single-molecule sequencing techniques for long read RNA sequencing data offered by Oxford Nanopore technologies have enabled the direct detection of RNA modifications from sequencing data. In this study, we propose a bio-computational framework, Nm-Nano, for predicting the presence of Nm sites in direct RNA sequencing data generated from two human cell lines. The Nm-Nano framework integrates two supervised machine learning (ML) models for predicting Nm sites: Extreme Gradient Boosting (XGBoost) and Random Forest (RF) with K-mer embedding. Evaluation on benchmark datasets from direct RNA sequecing of HeLa and HEK293 cell lines, demonstrates high accuracy (99% with XGBoost and 92% with RF) in identifying Nm sites. Deploying Nm-Nano on HeLa and HEK293 cell lines reveals genes that are frequently modified with Nm. In HeLa cell lines, 125 genes are identified as frequently Nm-modified, showing enrichment in 30 ontologies related to immune response and cellular processes. In HEK293 cell lines, 61 genes are identified as frequently Nm-modified, with enrichment in processes like glycolysis and protein localization. These findings underscore the diverse regulatory roles of Nm modifications in metabolic pathways, protein degradation, and cellular processes. The source code of Nm-Nano can be freely accessed at https://github.com/Janga-Lab/Nm-Nano.
dc.eprint.versionFinal published version
dc.identifier.citationHassan D, Ariyur A, Daulatabad SV, Mir Q, Janga SC. Nm-Nano: a machine learning framework for transcriptome-wide single-molecule mapping of 2´-O-methylation (Nm) sites in nanopore direct RNA sequencing datasets. RNA Biol. 2024;21(1):1-15. doi:10.1080/15476286.2024.2352192
dc.identifier.urihttps://hdl.handle.net/1805/42492
dc.language.isoen_US
dc.publisherTaylor & Francis
dc.relation.isversionof10.1080/15476286.2024.2352192
dc.relation.journalRNA Biology
dc.rightsAttribution-NonCommercial 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.sourcePMC
dc.subjectMachine Learning
dc.subjectNm (2´-O-methylation)
dc.subjectOxford Nanopore Technology
dc.subjectRNA modification detection
dc.subjectSingle molecule direct RNA sequencing
dc.subjectTranscriptomics
dc.titleNm-Nano: a machine learning framework for transcriptome-wide single-molecule mapping of 2´-O-methylation (Nm) sites in nanopore direct RNA sequencing datasets
dc.typeArticle
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Hassan2024Nano-CCBYNC.pdf
Size:
8.52 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.04 KB
Format:
Item-specific license agreed upon to submission
Description: