- Browse by Subject
Browsing by Subject "classification"
Now showing 1 - 10 of 16
Results Per Page
Sort Options
Item classCleaner: A Quantitative Method for Validating Peptide Identification in LC-MS/MS Workflows(2020-05) Key, Melissa Chester; Boukai, Benzion; Ragg, Susanne; Katz, Barry; Mosley, AmberBecause label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS) shotgun proteomics infers the peptide sequence of each measurement, there is inherent uncertainty in the identity of each peptide and its originating protein. Removing misidentified peptides can improve the accuracy and power of downstream analyses when differences between proteins are of primary interest. In this dissertation I present classCleaner, a novel algorithm designed to identify misidentified peptides from each protein using the available quantitative data. The algorithm is based on the idea that distances between peptides belonging to the same protein are stochastically smaller than those between peptides in different proteins. The method first determines a threshold based on the estimated distribution of these two groups of distances. This is used to create a decision rule for each peptide based on counting the number of within-protein distances smaller than the threshold. Using simulated data, I show that classCleaner always reduces the proportion of misidentified peptides, with better results for larger proteins (by number of constituent peptides), smaller inherent misidentification rates, and larger sample sizes. ClassCleaner is also applied to a LC-MS/MS proteomics data set and the Congressional Voting Records data set from the UCI machine learning repository. The later is used to demonstrate that the algorithm is not specific to proteomics.Item Classification of Intrinsically Disordered Regions and Proteins(American Chemical Society, 2014-07-09) van der Lee, Robin; Buljan, Marija; Lang, Benjamin; Weatheritt, Robert J.; Daughdrill, Gary W.; Dunker, A. Keith; Fuxreiter, Monika; Gough, Julian; Gsponer, Joerg; Jones, David T.; Kim, Philip M.; Kriwacki, Richard W.; Oldfield, Christopher J.; Pappu, Rohit V.; Tompa, Peter; Uversky, Vladimir N.; Wright, Peter E.; Babu, M. Madan; Department of Biochemistry & Molecular Biology, IU School of MedicineItem Energy Efficiency of Quantized Neural Networks in Medical Imaging(2022-04) Sinha, Priyanshu; Tummala, Sai Sreya; Purkayastha, Saptarshi; Gichoya, Judy W.; BioHealth Informatics, School of Informatics and ComputingThe main goal of this paper is to compare the energy efficiency of quantized neural networks to perform medical image analysis on different processors and neural network architectures. Deep neural networks have demonstrated outstanding performance in medical image analysis but require high computation and power usage. In our work, we review the power usage and temperature of processors when running Resnet and Unet architectures to perform image classification and segmentation respectively. We compare Edge TPU, Jetson Nano, Apple M1, Nvidia Quadro P6000 and Nvidia A6000 to infer using full-precision FP32 and quantized INT8 models. The results will be useful for designers and implementers of medical imaging AI on hand-held or edge computing devices.Item Enhancing a Taxonomy for Health Information Technology: An Exploratory Study of User Input Towards Folksonomy(2010) Dixon, Brian E.; McGowan, Julie JThe U.S. Agency for Healthcare Research and Quality has created a public website to disseminate critical information regarding its health information technology initiative. The website is maintained by AHRQ's Natiomal Resource Center (NRC) for Health Information Technology. In the latest continuous quality improvement project, the NRC used the site's search logs to extract user-generated search phrases. The phrases were then compared to the site's controlled vocabulary with respect to language, grammar, and search precision. Results of the comparison demonstrate that search log data can be a cost-effective way to improve controlled vocabularies as well as information retrieval. User-entered search phrases were found to also share many similarities with folksonomy tags.Item Geographical distribution and determining factors of different invasive ranks of alien species across China(Elsevier, 2020-06) Zhou, Quanlai; Wang, Yongcui; Li, Xuehua; Liu, Zhimin; Wu, Jing; Musa, Ala; Ma, Qu; Yu, Haibin; Cui, Xue; Wang, Lixin; Earth Sciences, School of ScienceDetermination of the geographical distribution and life-form spectra of alien species with different invasive abilities are essential to understand the process of invasion and to develop measures to manage alien species. Based on six classifications of Chinese alien species, environmental and social data, we determined species density, life-form spectrum of alien species, and the relationship between species density of alien species and climatic or social factors. The species density of alien species increased from the northwest to the southeast regions of China for all the six ranks. The boundary line between low and high species density of alien species was consistent with the dividing line of population density (the “Hu Line”). Mean annual precipitation was the most important factor for species density in malignant invaders, serious invaders, local invaders, and species requiring further observation (Ranks I, II, III, and V, respectively). Gross domestic product per square kilometer and annual minimum temperature were the most important factors in mild invaders and cultivated aliens (Ranks IV and VI, respectively). Annual and biennial herbs made up 52.9% to 71.2% of total species in Ranks I to IV; shrubs and trees 3.7% to 14.7%. The annual and biennial herbs were 35.5% and 32.6%, and the shrubs and trees were 25.3% and 31.6% in Ranks IV and VI. Results implied that precipitation was the most important factor on species density for the invasive alien species. However, social factors and temperature were the most important factors for the non-invasive alien species. The invasive alien species had a high proportion of annual and biennial herbs and non-invasive alien had a high proportion of shrubs and trees. It is important to understand the geographical distribution and life-form spectra of various invasive alien species for alien species controls.Item Impact of and Correction for Outcome Misclassification in Cumulative Incidence Estimation(Public Library of Science, 2015) Bakoyannis, Giorgos; Yiannoutsos, Constantin T.; Department of Biostatistics, School of Public HealthCohort studies and clinical trials may involve multiple events. When occurrence of one of these events prevents the observance of another, the situation is called "competing risks". A useful measure in such studies is the cumulative incidence of an event, which is useful in evaluating interventions or assessing disease prognosis. When outcomes in such studies are subject to misclassification, the resulting cumulative incidence estimates may be biased. In this work, we study the mechanism of bias in cumulative incidence estimation due to outcome misclassification. We show that even moderate levels of misclassification can lead to seriously biased estimates in a frequently unpredictable manner. We propose an easy to use estimator for correcting this bias that is uniformly consistent. Extensive simulations suggest that this method leads to unbiased estimates in practical settings. The proposed method is useful, both in settings where misclassification probabilities are known by historical data or can be estimated by other means, and for performing sensitivity analyses when the misclassification probabilities are not precisely known.Item Improving protein order-disorder classification using charge-hydropathy plots(Springer (Biomed Central Ltd.), 2014) Huang, Fei; Oldfield, Christopher J.; Xue, Bin; Hsu, Wei-Lun; Meng, Jingwei; Liu, Xiaowen; Shen, Li; Romero, Pedro; Uversky, Vladimir N.; Dunker, A. Keith; Department of Biochemistry and Molecular Biology, IU School of MedicineBACKGROUND: The earliest whole protein order/disorder predictor (Uversky et al., Proteins, 41: 415-427 (2000)), herein called the charge-hydropathy (C-H) plot, was originally developed using the Kyte-Doolittle (1982) hydropathy scale (Kyte & Doolittle., J. Mol. Biol, 157: 105-132(1982)). Here the goal is to determine whether the performance of the C-H plot in separating structured and disordered proteins can be improved by using an alternative hydropathy scale. RESULTS: Using the performance of the CH-plot as the metric, we compared 19 alternative hydropathy scales, with the finding that the Guy (1985) hydropathy scale (Guy, Biophys. J, 47:61-70(1985)) was the best of the tested hydropathy scales for separating large collections structured proteins and intrinsically disordered proteins (IDPs) on the C-H plot. Next, we developed a new scale, named IDP-Hydropathy, which further improves the discrimination between structured proteins and IDPs. Applying the C-H plot to a dataset containing 109 IDPs and 563 non-homologous fully structured proteins, the Kyte-Doolittle (1982) hydropathy scale, the Guy (1985) hydropathy scale, and the IDP-Hydropathy scale gave balanced two-state classification accuracies of 79%, 84%, and 90%, respectively, indicating a very substantial overall improvement is obtained by using different hydropathy scales. A correlation study shows that IDP-Hydropathy is strongly correlated with other hydropathy scales, thus suggesting that IDP-Hydropathy probably has only minor contributions from amino acid properties other than hydropathy. CONCLUSION: We suggest that IDP-Hydropathy would likely be the best scale to use for any type of algorithm developed to predict protein disorder.Item Inferring the patient’s age from implicit age clues in health forum posts(Elsevier, 2022-01) Black, Christopher M.; Meng, Weilin; Yao, Lixia; Ben Miled, Zina; Electrical and Computer Engineering, School of Engineering and TechnologyBroader patient-reported experiences in oncology are largely unknown due to the lack of available information from traditional data sources. Online health community data provide an exploratory way to uncover these experiences at a large scale. Analyzing these data can guide further studies towards understanding patients’ needs and experiences. However, analysis of online health data is inherently difficult due to the unstructured nature of these data and the variety of ways information can be expressed over text. Specifically, subscribers may not disclose critical information such as the age of the patient in their posts. In fact, the number of health forum posts that explicitly mention the age of the patient is significantly lower than the number of posts that do not include this information in the Reddit r/Cancer health forum under consideration in the present paper. Health-focused studies often need to consider or control for age as a confounder, hence the importance of having sufficient age data. This paper presents a methodology that can help classify health forum posts according to four age groups (0–17, 18–39, 40–64 and 65 + years) even when the posts do not contain explicit mention of the age of the patient. First, the subset of the posts that include explicit mention of the age of the patient is identified. Second, the explicit age clues are removed from these posts and used to train the proposed age classifier. The resulting classifier is able to infer the age of the patient using only implicit age clues with an average true positive rate (TPR) of 71%. This TPR is comparable to the average TPR of 69% obtained from human annotations for the same set of posts.Item International consensus recommendations for eosinophilic gastrointestinal disease nomenclature(Elsevier, 2022-02-16) Dellon, Evan S.; Gonsalves, Nirmala; Abonia, J. Pablo; Alexander, Jeffrey A.; Arva, Nicoleta C.; Atkins, Dan; Attwood, Stephen E.; Auth, Marcus K.H.; Bailey, Dominique D.; Biederman, Luc; Blanchard, Carine; Bonis, Peter A.; Bose, Paroma; Bredenoord, Albert J.; Chang, Joy W.; Chehade, Mirna; Collins, Margaret H.; Di Lorenzo, Carlo; Dias, Jorge Amil; Dohil, Ranjan; Dupont, Christophe; Falk, Gary W.; Ferreira, Cristina T.; Fox, Adam T.; Genta, Robert M.; Greuter, Thomas; Gupta, Sandeep K.; Hirano, Ikuo; Hiremath, Girish S.; Horsley-Silva, Jennifer L.; Ishihara, Shunji; Ishimura, Norihisa; Jensen, Elizabeth T.; Gutiérrez-Junquera, Carolina; Katzka, David A.; Khoury, Paneez; Kinoshita, Yoshikazu; Kliewer, Kara L.; Koletzko, Sibylle; Leung, John; Liacouras, Chris A.; Lucendo, Alfredo J.; Martin, Lisa J.; McGowan, Emily C.; Menard-Katcher, Calies; Metz, David C.; Miller, Talya L.; Moawad, Fouad J.; Muir, Amanda B.; Mukkada, Vincent A.; Murch, Simon; Nhu, Quan M.; Nomura, Ichiro; Nurko, Samuel; Ohtsuka, Yoshikazu; Oliva, Salvatore; Orel, Rok; Papadopoulou, Alexandra; Patel, Dhyanesh A.; Pesek, Robert D.; Peterson, Kathryn A.; Philpott, Hamish; Putnam, Philip E.; Richter, Joel E.; Rosen, Rachel; Ruffner, Melanie A.; Safroneeva, Ekaterina; Schreiner, Philipp; Schoepfer, Alain; Schroeder, Shauna R.; Shah, Neil; Souza, Rhonda F.; Spechler, Stuart J.; Spergel, Jonathan M.; Straumann, Alex; Talley, Nicholas J.; Thapar, Nikhil; Vandenplas, Yvan; Venkatesh, Rajitha D.; Vieira, Mario C.; von Arnim, Ulrike; Walker, Marjorie M.; Wechsler, Joshua B.; Wershil, Barry K.; Wright, Benjamin L.; Yamada, Yoshiyuki; Yang, Guang-Yu; Zevit, Noam; Rothenberg, Marc E.; Furuta, Glenn T.; Aceves, Seema S.; Pediatrics, School of MedicineBackground & Aims Substantial heterogeneity in terminology used for eosinophilic gastrointestinal diseases (EGID), particularly the catchall term “eosinophilic gastroenteritis”, limits clinical and research advances. We aimed to achieve an international consensus for standardized EGID nomenclature. Methods This consensus process utilized Delphi methodology. An initial naming framework was proposed and refined in iterative fashion, then assessed in a first round of Delphi voting. Results were discussed in two consensus meetings, the framework was updated, and re-assessed in a second Delphi vote, with a 70% threshold set for agreement. Results Of 91 experts participating, 85 (93%) completed the first and 82 (90%) completed the second Delphi surveys. Consensus was reached on all but two statements. “EGID” was the preferred umbrella term for disorders of GI tract eosinophilic inflammation in the absence of secondary causes (100% agreement). Involved GI tract segments will be named specifically and use an “Eo” abbreviation convention: eosinophilic gastritis (now abbreviated EoG), eosinophilic enteritis (EoN), and eosinophilic colitis (EoC). The term “eosinophilic gastroenteritis” is no longer preferred as the overall name (96% agreement). When >2 GI tract areas are involved, the name should reflect all of the involved areas. Conclusions This international process resulted in consensus for updated EGID nomenclature for both clinical and research use. EGID will be the umbrella term rather than “eosinophilic gastroenteritis”, and specific naming conventions by location of GI tract involvement are recommended. As more data are developed, this framework can be updated to reflect best practices and the underlying science.Item Periodontal diagnosis and treatment planning – An assessment of the understanding of the new classification system(Wiley, 2022-12) Kakar, Arushi; Blanchard, Steven; Shin, Daniel; Maupomé, Gerardo; Eckert, George J.; John, Vanchit; Periodontology, School of DentistryObjectives Substantial variations are seen among clinicians in the diagnosis and treatment planning of periodontal diseases. Accurate diagnosis and treatment planning are fundamental requirements for effective outcome-based patient care. The aim of this study was to evaluate the understanding of the American Academy of Periodontology and the European Federation of Periodontology 2017 periodontal disease classifications in diagnoses and treatment plans across four study groups. Methods The study recruited at least 20 participants in each of the four study groups. These included 1) Periodontal faculty and residents at Indiana University School of Dentistry (IUSD-PF) 2) IUSD general practice faculty (IUSD-GPF), 3) private practice periodontists (PPP), and 4) general practitioners (GP). The participants were provided with 10 HIPPA de-identified case records and a link to a survey. The survey comprised five demographic questions and two questions on diagnosis and treatment plan for each case along with a fixed list of responses. The responses were then compared against gold standards that were determined by a group of three board-certified periodontists. Results Overall, for diagnostic questions, GP (69%) were correct significantly less often than IUSD-PF (86%, p < 0.001), IUSD-GPF (79%, p = 0.002), and PPP (80%, p = 0.001). No significant differences (p > 0.05) in the overall correct treatment plan responses were found among the four groups (IUSD-PF: 69%, IUSD-GPF: 62%, PPP: 68%, and GP: 60%). The multi-rater kappas for with-in-group agreement on overall diagnosis ranged from 0.36 (GP) to 0.55 (IUSD-PF) and on overall treatment plan ranged from 0.32 (IUSD-GPF) to 0.42 (IUSD-PF). Overall agreement for diagnosis and treatment plans among the four groups was relatively low and none of the groups were statistically different from each other (p > 0.05). Conclusion Regular participation in calibration sessions may lead to more accurate adoption of the 2017 periodontal classification and thereby help provide consistent diagnosis and treatment.