- Browse by Author
Browsing by Author "Liu, Xiaowen"
Now showing 1 - 10 of 73
Results Per Page
Sort Options
Item A discovery-based proteomics approach identifies protein disulphide isomerase (PDIA1) as a biomarker of β cell stress in type 1 diabetes(Elsevier, 2023) Syed, Farooq; Singhal, Divya; Raedschelders, Koen; Krishnan, Preethi; Bone, Robert N.; McLaughlin, Madeline R.; Van Eyk, Jennifer E.; Mirmira, Raghavendra G.; Yang, Mei-Ling; Mamula, Mark J.; Wu, Huanmei; Liu, Xiaowen; Evans-Molina, Carmella; Pediatrics, School of MedicineBackground: Stress responses within the β cell have been linked with both increased β cell death and accelerated immune activation in type 1 diabetes (T1D). At present, information on the timing and scope of these responses as well as disease-related changes in islet β cell protein expression during T1D development is lacking. Methods: Data independent acquisition-mass spectrometry was performed on islets collected longitudinally from NOD mice and NOD-SCID mice rendered diabetic through T cell adoptive transfer. Findings: In islets collected from female NOD mice at 10, 12, and 14 weeks of age, we found a time-restricted upregulation of proteins involved in stress mitigation and maintenance of β cell function, followed by loss of expression of protective proteins that heralded diabetes onset. EIF2 signalling and the unfolded protein response, mTOR signalling, mitochondrial function, and oxidative phosphorylation were commonly modulated pathways in both NOD mice and NOD-SCID mice rendered acutely diabetic by T cell adoptive transfer. Protein disulphide isomerase A1 (PDIA1) was upregulated in NOD islets and pancreatic sections from human organ donors with autoantibody positivity or T1D. Moreover, PDIA1 plasma levels were increased in pre-diabetic NOD mice and in the serum of children with recent-onset T1D compared to non-diabetic controls. Interpretation: We identified a core set of modulated pathways across distinct mouse models of T1D and identified PDIA1 as a potential human biomarker of β cell stress in T1D.Item ACLRO: An Ontology for the Best Practice in ACLR Rehabilitation(2020-10) Phalakornkule, Kanitha; Jones, Josette F.; Boukai, Ben; Liu, Xiaowen; Purkayatha, Saptarshi; Duncan, William D.With the rise of big data and the demands for leveraging artificial intelligence (AI), healthcare requires more knowledge sharing that offers machine-readable semantic formalization. Even though some applications allow shared data interoperability, they still lack formal machine-readable semantics in ICD9/10 and LOINC. With ontology, the further ability to represent the shared conceptualizations is possible, similar to SNOMED-CT. Nevertheless, SNOMED-CT mainly focuses on electronic health record (EHR) documenting and evidence-based practice. Moreover, due to its independence on data quality, the ontology enhances advanced AI technologies, such as machine learning (ML), by providing a reusable knowledge framework. Developing a machine-readable and sharable semantic knowledge model incorporating external evidence and individual practice’s values will create a new revolution for best practice medicine. The purpose of this research is to implement a sharable ontology for the best practice in healthcare, with anterior cruciate ligament reconstruction (ACLR) as a case study. The ontology represents knowledge derived from both evidence-based practice (EBP) and practice-based evidence (PBE). First, the study presents how the domain-specific knowledge model is built using a combination of Toronto Virtual Enterprise (TOVE) and a bottom-up approach. Then, I propose a top-down approach using Open Biological and Biomedical Ontology (OBO) Foundry ontologies that adheres to the Basic Formal Ontology (BFO)’s framework. In this step, the EBP, PBE, and statistic ontologies are developed independently. Next, the study integrates these individual ontologies into the final ACLR Ontology (ACLRO) as a more meaningful model that endorses the reusability and the ease of the model-expansion process since the classes can grow independently from one another. Finally, the study employs a use case and DL queries for model validation. The study's innovation is to present the ontology implementation for best-practice medicine and demonstrate how it can be applied to a real-world setup with semantic information. The ACLRO simultaneously emphasizes knowledge representation in health-intervention, statistics, research design, and external research evidence, while constructing the classes of data-driven and patient-focus processes that allow knowledge sharing explicit of technology. Additionally, the model synthesizes multiple related ontologies, which leads to the successful application of best-practice medicine.Item Advanced natural language processing and temporal mining for clinical discovery(2015-08-17) Mehrabi, Saeed; Jones, Josette F.; Palakal, Mathew J.; Chien, Stanley Yung-Ping; Liu, Xiaowen; Schmidt, C. MaxThere has been vast and growing amount of healthcare data especially with the rapid adoption of electronic health records (EHRs) as a result of the HITECH act of 2009. It is estimated that around 80% of the clinical information resides in the unstructured narrative of an EHR. Recently, natural language processing (NLP) techniques have offered opportunities to extract information from unstructured clinical texts needed for various clinical applications. A popular method for enabling secondary uses of EHRs is information or concept extraction, a subtask of NLP that seeks to locate and classify elements within text based on the context. Extraction of clinical concepts without considering the context has many complications, including inaccurate diagnosis of patients and contamination of study cohorts. Identifying the negation status and whether a clinical concept belongs to patients or his family members are two of the challenges faced in context detection. A negation algorithm called Dependency Parser Negation (DEEPEN) has been developed in this research study by taking into account the dependency relationship between negation words and concepts within a sentence using the Stanford Dependency Parser. The study results demonstrate that DEEPEN, can reduce the number of incorrect negation assignment for patients with positive findings, and therefore improve the identification of patients with the target clinical findings in EHRs. Additionally, an NLP system consisting of section segmentation and relation discovery was developed to identify patients' family history. To assess the generalizability of the negation and family history algorithm, data from a different clinical institution was used in both algorithm evaluations.Item Capillary zone electrophoresis-tandem mass spectrometry with activated ion electron transfer dissociation for large-scale top-down proteomics(Springer, 2019-12) McCool, Elijah N.; Basharat, Abdul Rehman; Liu, Xiaowen; Coon, Joshua J.; Sun, Liangliang; BioHealth Informatics, School of Informatics and ComputingCapillary zone electrophoresis (CZE)-tandem mass spectrometry (MS/MS) has been recognized as an efficient approach for top-down proteomics recently for its high-capacity separation and highly sensitive detection of proteoforms. However, the commonly used collision-based dissociation methods often cannot provide extensive fragmentation of proteoforms for thorough characterization. Activated ion electron transfer dissociation (AI-ETD), that combines infrared photoactivation concurrent with ETD, has shown better performance for proteoform fragmentation than higher energy-collisional dissociation (HCD) and standard ETD. Here, we present the first application of CZE-AI-ETD on an Orbitrap Fusion Lumos mass spectrometer for large-scale top-down proteomics of Escherichia coli (E. coli) cells. CZE-AI-ETD outperformed CZE-ETD regarding proteoform and protein identifications (IDs). CZE-AI-ETD reached comparable proteoform and protein IDs with CZE-HCD. CZE-AI-ETD tended to generate better expectation values (E values) of proteoforms than CZE-HCD and CZE-ETD, indicating a higher quality of MS/MS spectra from AI-ETD respecting the number of sequence-informative fragment ions generated. CZE-AI-ETD showed great reproducibility regarding the proteoform and protein IDs with relative standard deviations less than 4% and 2% (n = 3). Coupling size exclusion chromatography (SEC) to CZE-AI-ETD identified 3028 proteoforms and 387 proteins from E. coli cells with 1% spectrum level and 5% proteoform-level false discovery rates. The data represents the largest top-down proteomics dataset using the AI-ETD method so far. Single-shot CZE-AI-ETD of one SEC fraction identified 957 proteoforms and 253 proteins. N-terminal truncations, signal peptide cleavage, N-terminal methionine removal, and various post-translational modifications including protein N-terminal acetylation, methylation, S-thiolation, disulfide bonds, and lysine succinylation were detected.Item Characterization of Proteoform Post-Translational Modifications by Top-Down and Bottom-Up Mass Spectrometry in Conjunction with Annotations(American Chemical Society, 2023) Chen, Wenrong; Ding, Zhengming; Zang, Yong; Liu, Xiaowen; BioHealth Informatics, School of Informatics and ComputingMany proteoforms can be produced from a gene due to genetic mutations, alternative splicing, post-translational modifications (PTMs), and other variations. PTMs in proteoforms play critical roles in cell signaling, protein degradation, and other biological processes. Mass spectrometry (MS) is the primary technique for investigating PTMs in proteoforms, and two alternative MS approaches, top-down and bottom-up, have complementary strengths. The combination of the two approaches has the potential to increase the sensitivity and accuracy in PTM identification and characterization. In addition, protein and PTM knowledge bases, such as UniProt, provide valuable information for PTM characterization and verification. Here, we present a software pipeline PTM-TBA (PTM characterization by Top-down and Bottom-up MS and Annotations) for identifying and localizing PTMs in proteoforms by integrating top-down and bottom-up MS as well as PTM annotations. We assessed PTM-TBA using a technical triplicate of bottom-up and top-down MS data of SW480 cells. On average, database search of the top-down MS data identified 2000 mass shifts, 814.5 (40.7%) of which were matched to 11 common PTMs and 423 of which were localized. Of the mass shifts identified by top-down MS, PTM-TBA verified 435 mass shifts using the bottom-up MS data and UniProt annotations.Item Characterization of proteoforms with unknown post-translational modi cations using the MIScore(ACS, 2016) Kou, Qiang; Zhu, Binhai; Wu, Si; Ansong, Charles; Tolić, Nikola; Paša-Tolić, Ljiljana; Liu, Xiaowen; Department of Biohealth Informatics, School of Informatics and ComputingVarious proteoforms may be generated from a single gene due to primary structure alterations (PSAs) such as genetic variations, alternative splicing, and post-translational modifications (PTMs). Top-down mass spectrometry is capable of analyzing intact proteins and identifying patterns of multiple PSAs, making it the method of choice for studying complex proteoforms. In top-down proteomics, proteoform identification is often performed by searching tandem mass spectra against a protein sequence database that contains only one reference protein sequence for each gene or transcript variant in a proteome. Because of the incompleteness of the protein database, an identified proteoform may contain unknown PSAs compared with the reference sequence. Proteoform characterization is to identify and localize PSAs in a proteoform. Although many software tools have been proposed for proteoform identification by top-down mass spectrometry, the characterization of proteoforms in identified proteoform–spectrum matches still relies mainly on manual annotation. We propose to use the Modification Identification Score (MIScore), which is based on Bayesian models, to automatically identify and localize PTMs in proteoforms. Experiments showed that the MIScore is accurate in identifying and localizing one or two modifications.Item Characterizing alternative splicing and long non-coding RNA with high-throughput sequencing technology(2018-10) Zhou, Ao; Wu, Huanmei; Liu, Yunlong; Janga, Sarath C.; Liu, XiaowenSeveral experimental methods has been developed for the study of the central dogma since late 20th century. Protein mass spectrometry and next generation sequencing (including DNA-Seq and RNA-Seq) forms a triangle of experimental methods, corresponding to the three vertices of the central dogma, i.e., DNA, RNA and protein. Numerous RNA sequencing and protein mass spectrometry experiments has been carried out in attempt to understand how the expression change of known genes affect biological functions in various of organisms, however, it has been once overlooked that the result data of these experiments are in fact holograms which also reveals other delicate biological mechanisms, such as RNA splicing and the expression of long non-coding RNAs. In this dissertation, we carried out five studies based on high-throughput sequencing data, in an attempt to understand how RNA splicing and differential expression of long non-coding RNAs is associated biological functions. In the first two studies, we identified and characterized 197 stimulant induced and 477 developmentally regulated alternative splicing events from RNA sequencing data. In the third study, we introduced a method for identifying novel alternative splicing events that were never documented. In the fourth study, we introduced a method for identifying known and novel RNA splicing junctions from protein mass spectrometry data. In the fifth study, we introduced a method for identifying long non-coding RNAs from poly-A selected RNA sequencing data. Taking advantage of these methods, we turned RNA sequencing and protein mass spectrometry data into an information gold mine of splicing and long non-coding RNA activities.Item Complex Proteoform Identification Using Top-Down Mass Spectrometry(2018-12) Kou, Qiang; Wu, Huanmei; Liu, Xiaowen; Liu, Yunlong; Al Hasan, MohammadProteoforms are distinct protein molecule forms created by variations in genes, gene expression, and other biological processes. Many proteoforms contain multiple primary structural alterations, including amino acid substitutions, terminal truncations, and posttranslational modifications. These primary structural alterations play a crucial role in determining protein functions: proteoforms from the same protein with different alterations may exhibit different functional behaviors. Because top-down mass spectrometry directly analyzes intact proteoforms and provides complete sequence information of proteoforms, it has become the method of choice for the identification of complex proteoforms. Although instruments and experimental protocols for top-down mass spectrometry have been advancing rapidly in the past several years, many computational problems in this area remain unsolved, and the development of software tools for analyzing such data is still at its very early stage. In this dissertation, we propose several novel algorithms for challenging computational problems in proteoform identification by top-down mass spectrometry. First, we present two approximate spectrum-based protein sequence filtering algorithms that quickly find a small number of candidate proteins from a large proteome database for a query mass spectrum. Second, we describe mass graph-based alignment algorithms that efficiently identify proteoforms with variable post-translational modifications and/or terminal truncations. Third, we propose a Markov chain Monte Carlo method for estimating the statistical signi ficance of identified proteoform spectrum matches. They are the first efficient algorithms that take into account three types of alterations: variable post-translational modifications, unexpected alterations, and terminal truncations in proteoform identification. As a result, they are more sensitive and powerful than other existing methods that consider only one or two of the three types of alterations. All the proposed algorithms have been incorporated into TopMG, a complete software pipeline for complex proteoform identification. Experimental results showed that TopMG significantly increases the number of identifications than other existing methods in proteome-level top-down mass spectrometry studies. TopMG will facilitate the applications of top-down mass spectrometry in many areas, such as the identification and quantification of clinically relevant proteoforms and the discovery of new proteoform biomarkers.Item Computational biology approaches in drug repurposing and gene essentiality screening(2016-06-20) Philips, Santosh; Li, Lang; Liu, Yunlong; Liu, Xiaowen; Skaar, Todd C.; Janga, Sarath C.The rapid innovations in biotechnology have led to an exponential growth of data and electronically accessible scientific literature. In this enormous scientific data, knowledge can be exploited, and novel discoveries can be made. In my dissertation, I have focused on the novel molecular mechanism and therapeutic discoveries from big data for complex diseases. It is very evident today that complex diseases have many factors including genetics and environmental effects. The discovery of these factors is challenging and critical in personalized medicine. The increasing cost and time to develop new drugs poses a new challenge in effectively treating complex diseases. In this dissertation, we want to demonstrate that the use of existing data and literature as a potential resource for discovering novel therapies and in repositioning existing drugs. The key to identifying novel knowledge is in integrating information from decades of research across the different scientific disciplines to uncover interactions that are not explicitly stated. This puts critical information at the fingertips of researchers and clinicians who can take advantage of this newly acquired knowledge to make informed decisions. This dissertation utilizes computational biology methods to identify and integrate existing scientific data and literature resources in the discovery of novel molecular targets and drugs that can be repurposed. In chapters 1 of my dissertation, I extensively sifted through scientific literature and identified a novel interaction between Vitamin A and CYP19A1 that could lead to a potential increase in the production of estrogens. Further in chapter 2 by exploring a microarray dataset from an estradiol gene sensitivity study I was able to identify a potential novel anti-estrogenic indication for the commonly used urinary analgesic, phenazopyridine. Both discoveries were experimentally validated in the laboratory. In chapter 3 of my dissertation, through the use of a manually curated corpus and machine learning algorithms, I identified and extracted genes that are essential for cell survival. These results brighten the reality that novel knowledge with potential clinical applications can be discovered from existing data and literature by integrating information across various scientific disciplines.Item Computational Methods for Proteoform Identification and Characterization Using Top-Down Mass Spectrometry(2023-12) Chen, Wenrong; Yan, Jingwen; Wang, Juexin; Wan, Jun; Zang, Yong; Luo, Xiao; Liu, XiaowenProteoforms, distinct molecular forms of proteins, arise due to numerous factors such as genetic mutations, differential gene expression, alternative splicing, and a range of biological processes. These proteoforms are often characterized by primary structural variances such as amino acid substitutions, terminal truncations, and post-translational modifications (PTMs). Proteoforms from the same proteins can manifest varied functional behaviors based on the specific alterations. The complexity inherent to proteoforms has elevated the significance of top-down mass spectrometry (MS) due to its proficiency in providing intricate sequence information for these intact proteoforms. During a typical top-down MS experiment, intact proteoforms are separated through platforms like liquid chromatography (LC) or capillary zone electrophoresis (CZE) prior to tandem mass spectrometry (MS/MS) analysis. Despite advancements in instruments and protocols for top-down MS, computational challenges persist, with software tool development still in its early stage. In this dissertation, our research revolves around three primary goals, all aimed at refining proteoform characterization. First, we bridge RNA-Seq with top-down MS for a better proteoform identification. We propose TopPG, an innovative proteogenomic tool which is tailored to generate proteoform sequence databases from genetic and splicing variations explicitly for top-down MS in contrast to traditional approaches. Second, to boost the accuracy of proteoform detection, we utilize machine learning methods to predict proteoform retention and migration times in top-down MS, an area previously overshadowed by bottom-up MS paradigms. critically evaluating models in a realm traditionally dominated by bottom-up MS methodologies. Lastly, recognizing the indispensable role of post-translational modifications (PTMs) on cellular functions, we introduce PTM-TBA. This tool integrates the complementary strengths of both top-down and bottom-up MS, augmented with annotations, building a comprehensive strategy for precise PTM identification and localization.