- Browse by Subject
Browsing by Subject "Support Vector Machine"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Length-dependent prediction of protein intrinsic disorder(BioMed Central, 2006-04-17) Peng, Kang; Radivojac, Predrag; Vucetic, Slobodan; Dunker, A. Keith; Obradovic, Zoran; Biology, School of ScienceBackground Due to the functional importance of intrinsically disordered proteins or protein regions, prediction of intrinsic protein disorder from amino acid sequence has become an area of active research as witnessed in the 6th experiment on Critical Assessment of Techniques for Protein Structure Prediction (CASP6). Since the initial work by Romero et al. (Identifying disordered regions in proteins from amino acid sequences, IEEE Int. Conf. Neural Netw., 1997), our group has developed several predictors optimized for long disordered regions (>30 residues) with prediction accuracy exceeding 85%. However, these predictors are less successful on short disordered regions (≤30 residues). A probable cause is a length-dependent amino acid compositions and sequence properties of disordered regions. Results We proposed two new predictor models, VSL2-M1 and VSL2-M2, to address this length-dependency problem in prediction of intrinsic protein disorder. These two predictors are similar to the original VSL1 predictor used in the CASP6 experiment. In both models, two specialized predictors were first built and optimized for short (≤30 residues) and long disordered regions (>30 residues), respectively. A meta predictor was then trained to integrate the specialized predictors into the final predictor model. As the 10-fold cross-validation results showed, the VSL2 predictors achieved well-balanced prediction accuracies of 81% on both short and long disordered regions. Comparisons over the VSL2 training dataset via 10-fold cross-validation and a blind-test set of unrelated recent PDB chains indicated that VSL2 predictors were significantly more accurate than several existing predictors of intrinsic protein disorder. Conclusion The VSL2 predictors are applicable to disordered regions of any length and can accurately identify the short disordered regions that are often misclassified by our previous disorder predictors. The success of the VSL2 predictors further confirmed the previously observed differences in amino acid compositions and sequence properties between short and long disordered regions, and justified our approaches for modelling short and long disordered regions separately. The VSL2 predictors are freely accessible for non-commercial use at http://www.ist.temple.edu/disprot/predictorVSL2.phpItem Predicting siRNA potency with random forests and support vector machines(BMC, 2010-12-01) Wang, Liangjiang; Huang, Caiyan; Yang, Jack Y.; Medicine, School of MedicineBackground Short interfering RNAs (siRNAs) can be used to knockdown gene expression in functional genomics. For a target gene of interest, many siRNA molecules may be designed, whereas their efficiency of expression inhibition often varies. Results To facilitate gene functional studies, we have developed a new machine learning method to predict siRNA potency based on random forests and support vector machines. Since there were many potential sequence features, random forests were used to select the most relevant features affecting gene expression inhibition. Support vector machine classifiers were then constructed using the selected sequence features for predicting siRNA potency. Interestingly, gene expression inhibition is significantly affected by nucleotide dimer and trimer compositions of siRNA sequence. Conclusions The findings in this study should help design potent siRNAs for functional genomics, and might also provide further insights into the molecular mechanism of RNA interference.Item Silent speech recognition in EEG-based brain computer interface(2015) Ghane, Parisa; Li, Lingxi; Tovar, Andres; Christopher, Lauren Ann; King, BrianA Brain Computer Interface (BCI) is a hardware and software system that establishes direct communication between human brain and the environment. In a BCI system, brain messages pass through wires and external computers instead of the normal pathway of nerves and muscles. General work ow in all BCIs is to measure brain activities, process and then convert them into an output readable for a computer. The measurement of electrical activities in different parts of the brain is called electroencephalography (EEG). There are lots of sensor technologies with different number of electrodes to record brain activities along the scalp. Each of these electrodes captures a weighted sum of activities of all neurons in the area around that electrode. In order to establish a BCI system, it is needed to set a bunch of electrodes on scalp, and a tool to send the signals to a computer for training a system that can find the important information, extract them from the raw signal, and use them to recognize the user's intention. After all, a control signal should be generated based on the application. This thesis describes the step by step training and testing a BCI system that can be used for a person who has lost speaking skills through an accident or surgery, but still has healthy brain tissues. The goal is to establish an algorithm, which recognizes different vowels from EEG signals. It considers a bandpass filter to remove signals' noise and artifacts, periodogram for feature extraction, and Support Vector Machine (SVM) for classification.Item Uncertainty Quantification by Convolutional Neural Network Gaussian Process Regression with Image and Numerical Data(AIAA, 2022-01) Yin, Jianhua; Du, Xiaoping; Mechanical and Energy Engineering, School of Engineering and TechnologyUncertainty Quantification (UQ) plays a critical role in engineering analysis and design. Regression is commonly employed to construct surrogate models to replace expensive simulation models for UQ. Classical regression methods suffer from the curse of dimensionality, especially when image data and numerical data coexist, which makes UQ computationally unaffordable. In this work, we propose a Convolutional Neural Network (CNN) based framework, which accommodates both image and numerical data. We first transform numerical data into images and then combine them with existing image data. The combined images are fed to CNN for regression. To obtain the model uncertainty, we integrate CNN with Gaussian Process (GP), which results in the mixed network CNN-GP. The simulation results show that CNN-GP can build accurate surrogate models for UQ with mixed data and that CNN-GP can also provide the uncertainty associated with the model prediction.Item Weighted-Support Vector Machine Learning Classifier of Circulating Cytokine Biomarkers to Predict Radiation-Induced Lung Fibrosis in Non-Small-Cell Lung Cancer Patients(Frontiers Media, 2021-02-01) Yu, Hao; Lam, Ka-On; Wu, Huanmei; Green, Michael; Wang, Weili; Jin, Jian-Yue; Hu, Chen; Jolly, Shruti; Wang, Yang; Kong, Feng-Ming Spring; BioHealth Informatics, School of Informatics and ComputingBackground: Radiation-induced lung fibrosis (RILF) is an important late toxicity in patients with non-small-cell lung cancer (NSCLC) after radiotherapy (RT). Clinically significant RILF can impact quality of life and/or cause non-cancer related death. This study aimed to determine whether pre-treatment plasma cytokine levels have a significant effect on the risk of RILF and investigate the abilities of machine learning algorithms for risk prediction. Methods: This is a secondary analysis of prospective studies from two academic cancer centers. The primary endpoint was grade≥2 (RILF2), classified according to a system consistent with the consensus recommendation of an expert panel of the AAPM task for normal tissue toxicity. Eligible patients must have at least 6 months' follow-up after radiotherapy commencement. Baseline levels of 30 cytokines, dosimetric, and clinical characteristics were analyzed. Support vector machine (SVM) algorithm was applied for model development. Data from one center was used for model training and development; and data of another center was applied as an independent external validation. Results: There were 57 and 37 eligible patients in training and validation datasets, with 14 and 16.2% RILF2, respectively. Of the 30 plasma cytokines evaluated, SVM identified baseline circulating CCL4 as the most significant cytokine associated with RILF2 risk in both datasets (P = 0.003 and 0.07, for training and test sets, respectively). An SVM classifier predictive of RILF2 was generated in Cohort 1 with CCL4, mean lung dose (MLD) and chemotherapy as key model features. This classifier was validated in Cohort 2 with accuracy of 0.757 and area under the curve (AUC) of 0.855. Conclusions: Using machine learning, this study constructed and validated a weighted-SVM classifier incorporating circulating CCL4 levels with significant dosimetric and clinical parameters which predicts RILF2 risk with a reasonable accuracy. Further study with larger sample size is needed to validate the role of CCL4, and this SVM classifier in RILF2.