- Browse by Subject
Browsing by Subject "Bioinformatics"
Now showing 1 - 10 of 47
Results Per Page
Sort Options
Item A distinct symptom pattern emerges for COVID-19 long-haul: a nationwide study(Springer Nature, 2022-09-23) Pinto, Melissa D.; Downs, Charles A.; Huang, Yong; El‑Azab, Sarah A.; Ramrakhiani, Nathan S.; Barisano, Anthony; Yu, Lu; Taylor, Kaitlyn; Esperanca, Alvaro; Abrahim, Heather L.; Hughes, Thomas; Giraldo Herrera, Maria; Rahamani, Amir M.; Dutt, Nikil; Chakraborty, Rana; Mendiola, Christian; Lambert, Natalie; Biostatistics, School of Public HealthLong-haul COVID-19, also called post-acute sequelae of SARS-CoV-2 (PASC), is a new illness caused by SARS-CoV-2 infection and characterized by the persistence of symptoms. The purpose of this cross-sectional study was to identify a distinct and significant temporal pattern of PASC symptoms (symptom type and onset) among a nationwide sample of PASC survivors (n = 5652). The sample was randomly sorted into two independent samples for exploratory (EFA) and confirmatory factor analyses (CFA). Five factors emerged from the EFA: (1) cold and flu-like symptoms, (2) change in smell and/or taste, (3) dyspnea and chest pain, (4) cognitive and visual problems, and (5) cardiac symptoms. The CFA had excellent model fit (x2 = 513.721, df = 207, p < 0.01, TLI = 0.952, CFI = 0.964, RMSEA = 0.024). These findings demonstrate a novel symptom pattern for PASC. These findings can enable nurses in the identification of at-risk patients and facilitate early, systematic symptom management strategies for PASC.Item Adult skin fibroblast state change in murine wound healing(Springer Nature, 2023-01-17) Gharbia, Fatma Z.; Abouhashem, Ahmed S.; Moqidem, Yomna A.; Elbaz, Ahmed A.; Abdellatif, Ahmed; Singh, Kanhaiya; Sen, Chandan K.; Azzazy, Hassan M. E.; Surgery, School of MedicineWound healing is a well-organized dynamic process involving coordinated consecutive phases: homeostasis, inflammation, proliferation and resolution. Fibroblasts play major roles in skin wound healing such as in wound contraction and release of growth factors which are of importance in angiogenesis and tissue remodeling. Abnormal fibroblast phenotypes have been identified in patients with chronic wounds. In this work, we analyzed scRNA-seq datasets of normal and wounded skin from mice at day 4 post-wound to investigate fibroblast heterogeneity during the proliferative phase of wound healing. Compositional analysis revealed a specific subset of fibroblast (cluster 3) that primarily increased in wounded skin (14%) compared to normal skin (3.9%). This subset was characterized by a gene signature marked by the plasma membrane proteins Sfrp2 + Sfrp4 + Sfrp1 + and the transcription factors Ebf1 + Prrx1 + Maged1 + . Differential gene expression and enrichment analysis identified epithelial to mesenchymal transition (EMT) and angiogenesis to be upregulated in the emerging subset of fibroblasts of the wounded skin. Using two other datasets for murine wounded skin confirmed the increase in cluster 3-like fibroblasts at days 2, 7 and 14 post-wounding with a peak at day 7. By performing a similarity check between the differential gene expression profile between wounded and normal skin for this emerging fibroblast subset with drug signature from the ConnectivityMap database, we identified drugs capable of mimicking the observed gene expression change in fibroblasts during wound healing. TTNPB, verteprofin and nicotinic acid were identified as candidate drugs capable of inducing fibroblast gene expression profile necessary for wound healing. On the other hand, methocarbamol, ifosfamide and penbutolol were recognized to antagonize the identified fibroblast differential expression profile during wound healing which might cause delay in wound healing. Taken together, analysis of murine transcriptomic skin wound healing datasets suggested a subset of fibroblasts capable of inducing EMT and further inferred drugs that might be tested as potential candidates to induce wound closure.Item An atlas of substrate specificities for the human serine/threonine kinome(Springer Nature, 2023) Johnson, Jared L.; Yaron, Tomer M.; Huntsman, Emily M.; Kerelsky, Alexander; Song, Junho; Regev, Amit; Lin, Ting-Yu; Liberatore, Katarina; Cizin, Daniel M.; Cohen, Benjamin M.; Vasan, Neil; Ma, Yilun; Krismer, Konstantin; Torres Robles, Jaylissa; van de Kooij, Bert; van Vlimmeren, Anne E.; Andrée-Busch, Nicole; Käufer, Norbert F.; Dorovkov, Maxim V.; Ryazanov, Alexey G.; Takagi, Yuichiro; Kastenhuber, Edward R.; Goncalves, Marcus D.; Hopkins, Benjamin D.; Elemento, Olivier; Taatjes, Dylan J.; Maucuer, Alexandre; Yamashita, Akio; Degterev, Alexei; Uduman, Mohamed; Lu, Jingyi; Landry, Sean D.; Zhang, Bin; Cossentino, Ian; Linding, Rune; Blenis, John; Hornbeck, Peter V.; Turk, Benjamin E.; Yaffe, Michael B.; Cantley, Lewis C.; Biochemistry and Molecular Biology, School of MedicineProtein phosphorylation is one of the most widespread post-translational modifications in biology (1,2). With advances in mass-spectrometry-based phosphoproteomics, 90,000 sites of serine and threonine phosphorylation have so far been identified, and several thousand have been associated with human diseases and biological processes (3,4). For the vast majority of phosphorylation events, it is not yet known which of the more than 300 protein serine/threonine (Ser/Thr) kinases encoded in the human genome are responsible (3). Here we used synthetic peptide libraries to profile the substrate sequence specificity of 303 Ser/Thr kinases, comprising more than 84% of those predicted to be active in humans. Viewed in its entirety, the substrate specificity of the kinome was substantially more diverse than expected and was driven extensively by negative selectivity. We used our kinome-wide dataset to computationally annotate and identify the kinases capable of phosphorylating every reported phosphorylation site in the human Ser/Thr phosphoproteome. For the small minority of phosphosites for which the putative protein kinases involved have been previously reported, our predictions were in excellent agreement. When this approach was applied to examine the signalling response of tissues and cell lines to hormones, growth factors, targeted inhibitors and environmental or genetic perturbations, it revealed unexpected insights into pathway complexity and compensation. Overall, these studies reveal the intrinsic substrate specificity of the human Ser/Thr kinome, illuminate cellular signalling responses and provide a resource to link phosphorylation events to biological pathways.Item Bibliometric and authorship trends over a 30 year publication history in two representative US sports medicine journals(Elsevier, 2020-03-31) Dynako, Joseph; Owens, Garrett W.; Loder, Randall T.; Frimpong, Tony; Gerena, Rolando Gabriel; Hasnain, Fawaz; Snyder, Dayton; Freiman, Serena; Hart, Kyle; Kacena, Melissa A.; Whipple, Elizabeth C.; Orthopaedic Surgery, School of MedicineBibliometric studies are important to understand changes and improvement opportunities in academia. This study compared bibliometric trends for two major sports medicine/arthroscopy journals, the American Journal of Sports Medicine® (AJSM®) and Arthroscopy® over the past 30 years. Trends over time and comparisons between both journals were noted for common bibliometric variables (number of authors, references, pages, citations, and corresponding author position) as well as author gender and continental origin. Appropriate statistical analyses were performed. A p < 0.001 was considered statistically significant. One representative year per decade was used. There were 814 manuscripts from AJSM® and 650 from Arthroscopy®. For AJSM® the number of manuscripts steadily increased from 86 in 1986 to 350 in 2016; for Arthroscopy® the number of manuscripts increased from 73 in 1985/1986, to 267 in 2006, but then dropped to 229 in 2016. There were significant increases in all bibliometric variables, except for the number of citations which decreased in Arthroscopy®. There were significant differences in manuscript region of origin by journal (p = 0.000002). Arthroscopy® had a greater percentage of manuscripts from Asia than AJSM® (19.3% vs 11.5%) while AJSM® had a greater percentage from North America (70.3% vs 59.2%); both journals had similar percentages from Europe (18.2% for AJSM® and 21.6% for Arthroscopy®). For AJSM® the average percentage of female first authors was 13.3%, increasing from 4.7% in 1986 to 19.3% in 2016; the average percentage of female corresponding authors was 7.3%. For Arthroscopy®, the average percentage of female first authors was 8.1%, increasing from 2.8% in 1985/1986 to 15.7% in 2016 (p = 0.00007). In conclusion, AJSM® and Arthroscopy® showed an increase in most variables analyzed. Although Arthroscopy® is climbing at a higher rate than AJSM® for female authors, AJSM® has an overall greater percentage of female authors.Item Bile acids targeted metabolomics and medication classification data in the ADNI1 and ADNIGO/2 cohorts(Nature Research, 2019-10-17) St. John-Williams, Lisa; Mahmoudiandehkordi, Siamak; Arnold, Matthias; Massaro, Tyler; Blach, Colette; Kastenmüller, Gabi; Louie, Gregory; Kueider-Paisley, Alexandra; Han, Xianlin; Baillie, Rebecca; Motsinger-Reif, Alison A.; Rotroff, Daniel; Nho, Kwangsik; Saykin, Andrew J.; Risacher, Shannon L.; Koal, Therese; Moseley, M. Arthur; Tenenbaum, Jessica D.; Thompson, J. Will; Kaddurah-Daouk, Rima; Alzheimer’s Disease Neuroimaging Initiative; Alzheimer’s Disease Metabolomics Consortium; Radiology and Imaging Sciences, School of MedicineAlzheimer’s disease (AD) is the most common cause of dementia. The mechanism of disease development and progression is not well understood, but increasing evidence suggests multifactorial etiology, with a number of genetic, environmental, and aging-related factors. There is a growing body of evidence that metabolic defects may contribute to this complex disease. To interrogate the relationship between system level metabolites and disease susceptibility and progression, the AD Metabolomics Consortium (ADMC) in partnership with AD Neuroimaging Initiative (ADNI) is creating a comprehensive biochemical database for patients in the ADNI1 cohort. We used the Biocrates Bile Acids platform to evaluate the association of metabolic levels with disease risk and progression. We detail the quantitative metabolomics data generated on the baseline samples from ADNI1 and ADNIGO/2 (370 cognitively normal, 887 mild cognitive impairment, and 305 AD). Similar to our previous reports on ADNI1, we present the tools for data quality control and initial analysis. This data descriptor represents the third in a series of comprehensive metabolomics datasets from the ADMC on the ADNI.Item A bioinformatics approach for precision medicine off-label drug drug selection among triple negative breast cancer patients(Oxford Academic, 2016-07) Cheng, Lijun; Schneider, Bryan P.; Li, Lang; Medical and Molecular Genetics, School of MedicineCancer has been extensively characterized on the basis of genomics. The integration of genetic information about cancers with data on how the cancers respond to target based therapy to help to optimum cancer treatment. OBJECTIVE: The increasing usage of sequencing technology in cancer research and clinical practice has enormously advanced our understanding of cancer mechanisms. The cancer precision medicine is becoming a reality. Although off-label drug usage is a common practice in treating cancer, it suffers from the lack of knowledge base for proper cancer drug selections. This eminent need has become even more apparent considering the upcoming genomics data. METHODS: In this paper, a personalized medicine knowledge base is constructed by integrating various cancer drugs, drug-target database, and knowledge sources for the proper cancer drugs and their target selections. Based on the knowledge base, a bioinformatics approach for cancer drugs selection in precision medicine is developed. It integrates personal molecular profile data, including copy number variation, mutation, and gene expression. RESULTS: By analyzing the 85 triple negative breast cancer (TNBC) patient data in the Cancer Genome Altar, we have shown that 71.7% of the TNBC patients have FDA approved drug targets, and 51.7% of the patients have more than one drug target. Sixty-five drug targets are identified as TNBC treatment targets and 85 candidate drugs are recommended. Many existing TNBC candidate targets, such as Poly (ADP-Ribose) Polymerase 1 (PARP1), Cell division protein kinase 6 (CDK6), epidermal growth factor receptor, etc., were identified. On the other hand, we found some additional targets that are not yet fully investigated in the TNBC, such as Gamma-Glutamyl Hydrolase (GGH), Thymidylate Synthetase (TYMS), Protein Tyrosine Kinase 6 (PTK6), Topoisomerase (DNA) I, Mitochondrial (TOP1MT), Smoothened, Frizzled Class Receptor (SMO), etc. Our additional analysis of target and drug selection strategy is also fully supported by the drug screening data on TNBC cell lines in the Cancer Cell Line Encyclopedia. CONCLUSIONS: The proposed bioinformatics approach lays a foundation for cancer precision medicine. It supplies much needed knowledge base for the off-label cancer drug usage in clinics.Item Bioinformatics detection of modulators controlling splicing factor‐dependent intron retention in the human brain(Wiley, 2022) Chen, Steven X.; Simpson, Ed; Reiter, Jill L.; Liu, Yunlong; Medical and Molecular Genetics, School of MedicineAlternative RNA splicing is an important means of genetic control and transcriptome diversity. However, when alternative splicing events are studied independently, coordinated splicing modulated by common factors is often not recognized. As a result, the molecular mechanisms of how splicing regulators promote or repress splice site recognition in a context‐dependent manner are not well understood. The functional coupling between multiple gene regulatory layers suggests that splicing is modulated by additional genetic or epigenetic components. Here, we developed a bioinformatics approach to identify causal modulators of splicing activity based on the variation of gene expression in large RNA sequencing datasets. We applied this approach in a neurological context with hundreds of dorsolateral prefrontal cortex samples. Our model is strengthened with the incorporation of genetic variants to impute gene expression in a Mendelian randomization‐based approach. We identified novel modulators of the splicing factor SRSF1, including UIMC1 and the long noncoding RNA CBR3‐AS1, that function over dozens of SRSF1 intron retention splicing targets. This strategy can be widely used to identify modulators of RNA‐binding proteins involved in tissue‐specific alternative splicing.Item Biomedical Literature Mining with Transitive Closure and Maximum Network Flow(http://doi.acm.org/10.1145/1851476.1851552, 2011-05-15) Hoblitzell, Andrew P.; Mukhopadhyay, Snehasis; Xia, Yuni; Fang, ShiafoenThe biological literature is a huge and constantly increasing source of information which the biologist may consult for information about their field, but the vast amount of data can sometimes become overwhelming. Medline, which makes a great amount of biological journal data available online, makes the development of automated text mining systems and hence “data-driven discovery” possible. This thesis examines current work in the field of text mining and biological literature, and then aims to mine documents pertaining to bone biology. The documents are retrieved from PubMed, and then direct associations between the terms are computers. Potentially novel transitive associations among biological objects are then discovered using the transitive closure algorithm and the maximum flow algorithm. The thesis discusses in detail the extraction of biological objects from the collected documents and the co-occurrence based text mining algorithm, the transitive closure algorithm, and the maximum network flow which were then run to extract the potentially novel biological associations. Generated hypotheses (novel associations) were assigned with significance scores for further validation by a bone biologist expert. Extension of the work in to hypergraphs for enhanced meaning and accuracy is also examined in the thesis.Item Celltyper: A Single-Cell Sequencing Marker Gene Tool Suite(2023-05) Paisley, Brianna Meadow; Liu, Yunlong; Yan, Jingwen; Cao, Sha; Wang, Juexin; Carfagna, MarkSingle-cell RNA-sequencing (scRNA-seq) has enabled researchers to study interindividual cellular heterogeneity, to explore disease impact on cellular composition of tissue, and to identify novel cell subtypes. However, a major challenge in scRNA-seq analysis is to identify the cell type of individual cells. Accurate cell type identification is crucial for any scRNA-seq analysis to be valid as incorrect cell type assignment will reduce statistical robustness and may lead to incorrect biological conclusions. Therefore, accurate and comprehensive cell type assignment is necessary for reliable biological insights into scRNA-seq datasets. With over 200 distinct cell types in humans alone, the concept of cell identity is large. Even within the same cell type there exists heterogeneity due to cell cycle phase, cell state, cell subtypes, cell health and the tissue microenvironment. This makes cell type classification a complicated biological problem requiring bioinformatics. One approach to classify cell type identity is using marker genes. Marker genes are genes specific for one or a few cell types. When coupled with bioinformatic methods, marker genes show promise of improving cell type classification. However, current scRNA-seq classification methods and databases use marker genes that are non-specific across sources, samples, and/or species leading to bias and errors. Furthermore, many existing tools require manual intervention by the user to provide training datasets or the expected number and name of cell types, which can introduce selection bias. The selection bias negatively impacts the accuracy of cell type classification methods as the model cannot extrapolate outside of the user inputs even when it is biologically meaningful to do so. In this dissertation I developed CellTypeR, a suite of tools to explore the biology governing cell identity in a “normal” state for humans and mice. The work presented here accomplishes three aims: 1. Develop an ontology standardized database of published marker gene literature; 2. Develop and apply a marker gene classification algorithm; and 3. Create user interface and input data structure for scRNA-seq cell type prediction.Item Complex Proteoform Identification Using Top-Down Mass Spectrometry(2018-12) Kou, Qiang; Wu, Huanmei; Liu, Xiaowen; Liu, Yunlong; Al Hasan, MohammadProteoforms are distinct protein molecule forms created by variations in genes, gene expression, and other biological processes. Many proteoforms contain multiple primary structural alterations, including amino acid substitutions, terminal truncations, and posttranslational modifications. These primary structural alterations play a crucial role in determining protein functions: proteoforms from the same protein with different alterations may exhibit different functional behaviors. Because top-down mass spectrometry directly analyzes intact proteoforms and provides complete sequence information of proteoforms, it has become the method of choice for the identification of complex proteoforms. Although instruments and experimental protocols for top-down mass spectrometry have been advancing rapidly in the past several years, many computational problems in this area remain unsolved, and the development of software tools for analyzing such data is still at its very early stage. In this dissertation, we propose several novel algorithms for challenging computational problems in proteoform identification by top-down mass spectrometry. First, we present two approximate spectrum-based protein sequence filtering algorithms that quickly find a small number of candidate proteins from a large proteome database for a query mass spectrum. Second, we describe mass graph-based alignment algorithms that efficiently identify proteoforms with variable post-translational modifications and/or terminal truncations. Third, we propose a Markov chain Monte Carlo method for estimating the statistical signi ficance of identified proteoform spectrum matches. They are the first efficient algorithms that take into account three types of alterations: variable post-translational modifications, unexpected alterations, and terminal truncations in proteoform identification. As a result, they are more sensitive and powerful than other existing methods that consider only one or two of the three types of alterations. All the proposed algorithms have been incorporated into TopMG, a complete software pipeline for complex proteoform identification. Experimental results showed that TopMG significantly increases the number of identifications than other existing methods in proteome-level top-down mass spectrometry studies. TopMG will facilitate the applications of top-down mass spectrometry in many areas, such as the identification and quantification of clinically relevant proteoforms and the discovery of new proteoform biomarkers.