- Browse by Subject
Browsing by Subject "Database searching"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Complex Proteoform Identification Using Top-Down Mass Spectrometry(2018-12) Kou, Qiang; Wu, Huanmei; Liu, Xiaowen; Liu, Yunlong; Al Hasan, MohammadProteoforms are distinct protein molecule forms created by variations in genes, gene expression, and other biological processes. Many proteoforms contain multiple primary structural alterations, including amino acid substitutions, terminal truncations, and posttranslational modifications. These primary structural alterations play a crucial role in determining protein functions: proteoforms from the same protein with different alterations may exhibit different functional behaviors. Because top-down mass spectrometry directly analyzes intact proteoforms and provides complete sequence information of proteoforms, it has become the method of choice for the identification of complex proteoforms. Although instruments and experimental protocols for top-down mass spectrometry have been advancing rapidly in the past several years, many computational problems in this area remain unsolved, and the development of software tools for analyzing such data is still at its very early stage. In this dissertation, we propose several novel algorithms for challenging computational problems in proteoform identification by top-down mass spectrometry. First, we present two approximate spectrum-based protein sequence filtering algorithms that quickly find a small number of candidate proteins from a large proteome database for a query mass spectrum. Second, we describe mass graph-based alignment algorithms that efficiently identify proteoforms with variable post-translational modifications and/or terminal truncations. Third, we propose a Markov chain Monte Carlo method for estimating the statistical signi ficance of identified proteoform spectrum matches. They are the first efficient algorithms that take into account three types of alterations: variable post-translational modifications, unexpected alterations, and terminal truncations in proteoform identification. As a result, they are more sensitive and powerful than other existing methods that consider only one or two of the three types of alterations. All the proposed algorithms have been incorporated into TopMG, a complete software pipeline for complex proteoform identification. Experimental results showed that TopMG significantly increases the number of identifications than other existing methods in proteome-level top-down mass spectrometry studies. TopMG will facilitate the applications of top-down mass spectrometry in many areas, such as the identification and quantification of clinically relevant proteoforms and the discovery of new proteoform biomarkers.Item Database Searching on INSPIRE(H.W. Wilson Company, 2004) Hoskin, AdeleHave you ever searched a database, received a set of answers that were OK but felt that more information was available if only you had the key? Many times a word or a phrase will produce interesting information but not exactly the information that is needed. Efficiently searching databases requires the separation of the search language and the database content. The purpose of this article is to discuss some searching skills to help separate the two and increase your precision. This paper will review basic searching, advanced searching, and some special features.Item Finding the hard to find: locating newspapers, historic documents & international publications using the internet(2010-02-26T18:16:46Z) Baich, TinaDo you groan every time you see a newspaper, historic document or international publication interlibrary loan request? This presentation will discuss various resources that will help you locate these hard-to-find documents. The focus will be Web-based finding aids and digital repositories that provide instant access to documents. Another key is tracking your finding aids so you can easily return to them and you'll hear recommendations on how easy this is to do. By the end of the program, you'll no longer be groaning!Item Identification of Publications on Disordered Proteins from PubMed(2012-08-07) Sirisha, Peyyeti; Xia, Yuni; Dunker, A. Keith; Chen, JakeThe literature corresponding to disordered proteins has been on a rise. As the number of publications increase, the time and effort needed to manually identify the relevant publications and protein information to add to centralized repository (called DisProt) is becoming arduous and critical. Existing search facilities on PubMed can retrieve a seemingly large number of publications based on keywords and does not have any support for ranking them based on the probability of the protein names mentioned in a given abstract being added to DisProt. This thesis explores a novel system of using disorder predictors and context based dictionary methods to quickly identify publications on disordered proteins from the PubMed database. NLProt, which is built around Support Vector Machines, is used to identify protein names and PONDR-FIT which is an Artificial Neural Network based meta- predictor is used for identifying protein disorder. The work done in this thesis is of immediate significance in identifying disordered protein names. We have tested the new system on 100 abstracts from DisProt [these abstracts were found to be relevant to disordered proteins and were added to DisProt manually by the annotators.] This system had an accuracy of 87% on this test set. We then took another 100 recently added abstracts from PubMed and ran our algorithm on them. This time it had an accuracy of 68%. We suggested improvements to increase the accuracy and believe that this system can be applied for identifying disordered proteins from literature.