Informatics School Theses and Dissertations

Permanent URI for this collection

https://hdl.handle.net/1805/954

Please go to "Informatics Graduate Theses and PhD Dissertations" to submit dissertations and theses for the School of Informatics and Computing, at: http://hdl.handle.net/1805/303.

Browse

Now showing 1 - 10 of 196

Spectral Deconvolution, Feature Detection, and Proteoform Identification for Top-Down Proteomics
(2024-12) Basharat, Abdul Rehman; Yan, Jingwen; Liu, Xiaowen; Zang, Yong; Wang, Juexin; Wan, Jun; Luo, Xiao
Liquid chromatography-based mass spectrometry (LC-MS) is widely used for proteoform identification, characterization, and quantitation. Bottom-up proteomics analyzes enzymatically digested peptides, while top-down proteomics examines intact proteoforms, enabling comprehensive identification of proteoforms with post-translational modifications (PTMs), genetic mutations, and alternative splicing. In MS data, due to the occurrence of different isotopes, proteins with the same chemical composition and charge state produce a group of peaks with different mass-to-charge ratios (m/z), called an isotopic envelope. A top-down mass spectrum often contains hundreds of high-charge state envelopes, some of which are overlapping. Consequently, analyzing top-down MS data presents computational challenges due to the complexity of top-down spectra. This dissertation introduces three new software tools EnvCNN, TopFD, and TopDIA for enhancing proteoform identification, characterization, and quantification in top-down MS data analysis. EnvCNN is a deep-learning model for evaluating isotopic envelopes of proteoforms and their fragments. This model aims to improve the accuracy of reporting fragments, thus increasing the number of identified proteoforms and improving the reliability of proteoform identification and characterization. TopFD is a software tool for proteoform feature detection, grouping all peaks of a proteoform in an LC-MS map into a single feature. TopFD outperforms other existing tools in the accuracy and reproducibility of feature detection, thereby improving proteoform identification and quantification. TopDIA is the first software tool for proteoform identification by top-down data-independent acquisition MS (TD-DIA-MS). Unlike conventional top-down data-dependent acquisition MS (TD-DDA-MS), which relies on intensity-based proteoform selection to generate fragment mass spectra, TD-DIA-MS fragments all proteoforms within predefined isolation windows, generating fragment mass spectra for every proteoform. TopDIA processes TD-DIA-MS data to generate demultiplexed pseudo spectra, which are searched against a protein database for proteoform identification, leading to a significant increase in the number of identified proteoforms compared with TD-DDA-MS. In summary, these new software tools help advance proteomics research by increasing the accuracy and comprehensiveness of proteoform analysis by top-down MS.
A Model of Project Continuation in Game Jams and Hackathons
(2024-08) Faas, Travis Byron; Miller, Andrew; Dombrowski, Lynn; Brady, Erin; Hickey, Daniel
Game jams and hackathons are events where individuals design and build new technology prototypes in a short timeframe. Prototypes made at hackathons are often abandoned after the event and are never finished or used by their intended audiences. Though continued work on prototypes is not the only goal of hackathons, many expect that some hackathon projects will continue to be developed to fulfill the civic, educational, or entrepreneurial goals of hackathon organizers and attendees. To assist hackathon organizers in running hackathons that produce continued projects, I present in this document a model of project continuation in online hackathons and a tool that directs conversations that develops the necessary components of continuation. This model was developed through three studies: a design study that generated the design for a bot to be used in an online game jam that directs users in breaking the boundedness of their game concept, a deployment study where the bot was deployed and used in an online game jam, and a longitudinal study that followed the continuation practices of individuals who used the bot during the jam. In the presented continuation model, I highlight how recent personal interests generate an extended development context that reduces the boundedness of game jams and show how regular sharing and discussion of progress creates social investment in the success of projects that contributes to continuation intention and support. This continuation model requires a resting period post-hackathon, which sometimes generates conceptual continuation where a project is abandoned but the major project concepts are explored in later projects. Taking this idea of concept continuation further, I offer suggestions on how to gain continuation in hackathons by reducing their time-boundedness and making the events more permeable to allow for prior-existing projects to be accepted and further developed at these events.
Elucidating Chemotherapy Resistance in Breast Cancer Through Advanced Subpathway Analysis Algorithm: A Novel Approach to Topological Interpretation of Transcriptomic Data
(2024-08) Huo, Yang; Yan, Jingwen; Li, Lang; Zhang, Chi; Zhang, Pengyue; Wang, Juexin
Chemotherapy resistance in breast cancer, particularly Triple-Negative Breast Cancer (TNBC), poses a significant challenge to effective treatment, contributing to high mortality rates. This thesis investigates the molecular mechanisms underlying chemoresistance, focusing on enhancing the granularity of pathway analysis through an innovative sub-pathway analysis algorithm. Traditional pathway analyses, while providing fundamental insights, often overlook the intricate and individual-specific nature of chemoresistance. The research introduces an advanced sub-pathway analysis algorithm that dissects larger pathways into smaller, more detailed sub-pathways, allowing for precise exploration of molecular interactions driving chemoresistance. The methodology involves a comparative analysis of transcriptome profiles from breast cancer patients before and after chemotherapy, utilizing both established and new sub-pathway analytical techniques. This integrative approach aims to uncover previously unrecognized mechanisms of resistance and identify potential biomarkers for chemoresistance. Furthermore, the thesis presents the development of a new algorithm, i-Subway, designed to conduct sub-pathway analyses at the individual sample level. This algorithm incorporates both inhibitory and inductive relationships within sub-pathways and integrates the empirical Bayes statistical model with the topological structure of the sub-pathway, significantly improving computational efficiency. When applied to transcriptomic data from 56 breast cancer cell lines, i-Subway revealed substantial variation at the sub-pathway level, providing deeper insights into the molecular basis of chemoresistance. Overall, this thesis aims to enhance the understanding of the specific pathways and sub-pathways altered in response to chemotherapy, offering new insights into the molecular mechanisms of chemoresistance in breast cancer. The findings are expected to facilitate the identification of novel therapeutic targets and contribute to the development of more effective, individualized treatment strategies.
Integrating Imaging and Genetics Data for Improved Understanding and Detection of Alzheimer's Disease
(2024-08) He, Bing; Janga, Sarath Chandra; Saykin, Andrew J.; Yu, Meichen; Yan, Jingwen
Alzheimer’s disease (AD) is a progressive and irreversible brain disorder characterized by a slow and intricate progression, in which the initial pathological changes occur long before noticeable symptoms. AD is highly heritable and genetic factors play an essential role in AD development. Large scale genome-wide association studies have identified numerous SNPs related to AD. However, our understanding of the connections between genetics findings and altered brain phenotype is still limited. Brain imaging genetics, an emerging approach, aims to investigate the relationship between genetic variations and brain structure or function. It has great potential to provide insights into the underlying biological mechanisms and to enable the early detection of AD. Our study aimed to develop and apply novel computational approaches for more robust discovery of imaging genetics associations and for improved detection of AD in early stage. Specifically, we focused on addressing the heterogeneity problem inherent in integrating imaging and genetics data. In aim 1, we applied a novel biclustering method to associate genetic variations with functional brain connectivity altered in AD patients. In aim 2, we proposed novel strategy to integrate imaging and genetic data to serve as a new type of prior knowledge and investigated their role in guiding imaging genetics association. Finally, in aim 3, we proposed a multi-factorial pseudotime approach to integrate heterogeneous genotype and amyloid imaging data and examined its potential for staging and early detection of AD. Collectively, results from these objectives aimed to enhance our understanding and detection of AD, providing valuable information to inform therapeutic strategies to slow or halt disease progression.
Techniques for Improving the Robustness of Visual Analytics
(2024-08) Koonchanok, Ratanond; Reda, Khairi; Chakraborty, Sunandan; Cafaro, Francesco; McCabe, Sean
Interactive visualization systems, such as Tableau, are integral parts of the data analysis workflow. While such tools were built to help analysts perform exploratory data analysis with minimal effort, analysts have also been using them to make statistical inferences (e.g., predicting future trends) based on patterns revealed by the dataset. However, in addition to revealing true patterns, visualizations can also surface noise and other random fluctuations in data, which could lead to spurious discoveries. The latter poses a threat to the trustworthiness of analyses, especially given the increased reliance on visualizations across various domains. My central thesis is that it is possible to reduce the incidence of false discovery by introducing lightweight user interface elements in visualization tools. In particular, I propose eliciting and incorporating analyst beliefs into visualizations as an approach for guarding against spurious patterns and reducing the risk of analysts “overfitting” the data. To study how analysts would respond to such intervention, I first designed an interactive tool that combined visual belief elicitation with traditional visualization functionalities. In a qualitative study with data analysts, the tool appeared to allow users to operationalize their working knowledge into analyses, nudging them to adopt normative analysis practices (e.g., specifying hypotheses before peeking at data). I then conducted a crowdsourced experiment to investigate if this design could indeed help reduce the incidence of false discovery. Compared to a control condition, participants who used our intervention made significantly more accurate inferences and reported fewer false discoveries. Lastly, I investigated the capability of human intuition by comparing inferences from participants against those generated by statistical machines to understand the advantages and limitations of each. Overall, my thesis paves the way toward the development of a robust visual analytics system that facilitates collaborative decision-making processes, leveraging the complementary abilities of humans and machines.
Multi-omics Investigation into Alzheimer's Disease: Functional Mechanism and Early Detection
(2024-08) Pugalenthi, Pradeep Varathan; Yan, Jingwen; Janga, Sarath Chandra; Nho, Kwangsik; Wang, Juexin
Alzheimer’s disease (AD), a multi-factorial and highly heritable condition, stands as the foremost contributor to dementia. Despite its early discovery and extensive studies, the underlying pathogenesis of AD remains incomplete. This thesis addresses critical aspects of AD through multi-omics approach for improved understanding of underlying functional mechanisms and for improved precision in early detection. Multi-omics integration allows us to explore a wide spectrum of AD-related changes at different biological levels including genomics and metabolomics and how they associate with the biomarkers. In the first aim, I performed an integrative analysis of summary statistics from genome-wide association study (GWAS) and expression quantitative trait loci (eQTL) analysis. Results of this study confirmed the potential of integrative GWAS and eQTL analysis in estimating the transcriptomic changes when lack of tissue-specific expression data, and provided important insights into tissue-specific downstream biology of observed GWAS associations in AD. In the second aim, I took a step further and hypothesized the epistatic effect of GWAS findings and neighboring variants on the downstream functional mechanism. Leveraging the recent advances in sequence-based genome annotation, I investigated the tissue-specific effects of top AD GWAS variants on the chromatin profiles. With in-silico mutagenesis, GWAS variants were found to function via either lead effect or epistatic effect, pinpointing the limitation of existing focus on single-variant-based function annotation. In the last aim, I built a comprehensive bioinformatics pipeline to investigate the potential of metabolic age as an early indicator for AD progression, in which we also observed significant difference between sex groups. We identified strong associations of metabolic age with longitudinal changes of current diagnostic metrics in the ATN framework, suggesting the potential of metabolic age as early biomarkers. Collectively, results from these aims contribute to advancing our understanding of AD and provide valuable insights for future research and clinical applications.
Predictive Molecular Biomarkers for Human Health Risk
(2024-07) Jiang, Guanglong; Liu, Yunlong; Yan, Jingwen; Wan, Jun; Wang, Juexin
Molecular biomarkers play vital roles in disease risk assessment, personalized treatment selection and therapy response monitoring. This thesis explores the use of diverse molecular biomarkers for the assessment of human health risks, primarily in cancers. MiRNAs and their isoforms (isomiR) are promising biomarker candidates due to their comprehensive regulation of gene expression and involvement in physiology and pathological processes. The first study demonstrates that genetic variations in miRNA precursor regions influence the biogenesis of isomiRs in 95 SNP-isomiR pairs. Notably, we identified a SNP (rs6505162) impacting hsa-miR-423 isomiRs, potentially linked to breast cancer pathogenesis, suggesting their potential as biomarkers in disease assessment. The findings also highlight the mechanism of genetic regulation of isomiR generation and advance our understanding of miRNA mediated post-transcriptional regulation. Secondly, we explored the predictive capacity of aberrant intron-retention neoantigen burden (INB) in predicting the response to immune checkpoint inhibitors (ICI) in metastatic cancers. Both INB and tumor mutation burden (TMB) were strong predictors of ICI therapy duration (p = 0.019 and 0.038, respectively), with patients exhibiting elevated levels demonstrating exceptional treatment duration. Patients with high INB or TMB had improved overall survival (OS) (p = 1.1×10-4). Importantly, INB and TMB were uncorrelated, indicating that they capture distinct aspects of tumor neoantigen. Together, the combined assessment of INB and TMB offers improved accuracy in predicting clinical response to ICI therapies. Finally, we extend the application of molecular biomarkers to the assessment of minimal residual disease for risk stratification in triple negative breast cancer (TNBC) with residual disease after neoadjuvant chemotherapy. Detection of circulating tumor DNA (ctDNA) was a significant predictor of inferior distant disease-free survival (DDFS) (p = 0.006), disease-free survival (DFS) (p = 0.009) and OS (p = 0.002). The combination of circulating tumor cell (CTC) and ctDNA markers provided superior sensitivity and prognostic value. In conclusion, the studies provide compelling evidence for the utility of diverse molecular biomarkers – including miRNA isoforms, abnormal splicing-based neoantigen metrics and circulating tumor DNA in disease prediction and treatment efficacy assessment. By elucidating the roles of diverse biomarkers in predicting cancer pathogenesis and therapeutic response, we pave the way towards more personalized and effective approaches to managing human health risks.
Computational Methods for Proteoform Identification and Characterization Using Top-Down Mass Spectrometry
(2023-12) Chen, Wenrong; Yan, Jingwen; Wang, Juexin; Wan, Jun; Zang, Yong; Luo, Xiao; Liu, Xiaowen
Proteoforms, distinct molecular forms of proteins, arise due to numerous factors such as genetic mutations, differential gene expression, alternative splicing, and a range of biological processes. These proteoforms are often characterized by primary structural variances such as amino acid substitutions, terminal truncations, and post-translational modifications (PTMs). Proteoforms from the same proteins can manifest varied functional behaviors based on the specific alterations. The complexity inherent to proteoforms has elevated the significance of top-down mass spectrometry (MS) due to its proficiency in providing intricate sequence information for these intact proteoforms. During a typical top-down MS experiment, intact proteoforms are separated through platforms like liquid chromatography (LC) or capillary zone electrophoresis (CZE) prior to tandem mass spectrometry (MS/MS) analysis. Despite advancements in instruments and protocols for top-down MS, computational challenges persist, with software tool development still in its early stage. In this dissertation, our research revolves around three primary goals, all aimed at refining proteoform characterization. First, we bridge RNA-Seq with top-down MS for a better proteoform identification. We propose TopPG, an innovative proteogenomic tool which is tailored to generate proteoform sequence databases from genetic and splicing variations explicitly for top-down MS in contrast to traditional approaches. Second, to boost the accuracy of proteoform detection, we utilize machine learning methods to predict proteoform retention and migration times in top-down MS, an area previously overshadowed by bottom-up MS paradigms. critically evaluating models in a realm traditionally dominated by bottom-up MS methodologies. Lastly, recognizing the indispensable role of post-translational modifications (PTMs) on cellular functions, we introduce PTM-TBA. This tool integrates the complementary strengths of both top-down and bottom-up MS, augmented with annotations, building a comprehensive strategy for precise PTM identification and localization.
Survivor-Centered Transformative Justice: An Approach to Designing Sociotechnical Systems Alongside Domestic Violence Stakeholders in US Muslim Communities
(2023-08) Rabaan, Hawra; Dombrowski, Lynn; Bolchini, Davide; Brady, Erin; Khaja, Khadija; Schoenebeck, Sarita
Domestic violence (DV) is a social, political, and legal problem that requires contextual examination. In the United States, earlier advocacy work focused on law reform to empower survivors in influencing the public and state to take DV seriously and provide resources to support and protect survivors. However, harm is still perpetuated systemically and socially for survivors, especially those from racial and religious minorities. In this dissertation, I focus on domestic violence within the US-based Muslim population due to the unique issues Muslim survivors face when dealing with governmental services and service providers (e.g., gendered Islamophobia, racial discrimination, punitive actions) and within the Muslim community itself (e.g., community trauma, faith leaders lacking appropriate training). This work incorporates three phases of research that utilize qualitative and design methods to examine the forms and dynamics of domestic violence, help-seeking and healing challenges, and survivor advocacy, abuser accountability, and community transformation interventions. I argue that to pursue justice for survivors in design research, a multifaceted approach rooted in principles from Islamic feminism, traumainformed care, and restorative and transformative justice tenets is needed. Consequently, I propose Survivor-Centered Transformative Justice (SCTJ), a framework to discern individual and systemic harm, to understand how to design alongside victim-survivors, and to focus on victim-survivors' autonomy. I illustrate how SCTJ allows researchers and designers to account for individual inequalities, recognize communities' preferred approaches to pursuing justice, tackle the underlying conditions enabling harm, and provide interventions that alter, repair, and reduce harm within different scales of relationships. Additionally, I present the concept of healing structures, which aim to safeguard against harmful community practices, discriminatory laws, and practices while facilitating collective and survivor-centered interventions to promote healing. Lastly, I demonstrate the potential for design research to progress by taking a closer look into the belief systems, cultural values, and surrounding conditions that contribute to users' obtainable choices and decision-making processes, and by centering the needs of people at the margins. With this empirical, theoretical, and design work, I present insights that inform the HCI community at the intersection of social justiceoriented design, Islamic feminism, and gender-based violence.
Understanding Informational Practices and Exploring Data Collection Approaches for Quality of Life in Brain Injury Illness Management
(2023-07) Masterson, Yamini Lalama Patnaik; Brady, Erin; Miller, Andrew D.; Toscos, Tammy; Hong, Youngbok; Gunter, Tracy D.
Brain injury, a combination of medical injury, chronic illness, and impairment, affects more than 3.5 million people in the United States every year through an interplay of physiological, psychological, environmental, and cultural factors spanning clinical recovery, illness management, and personal recovery phases. The lack of collaborative and integrated understanding from healthcare and accessibility communities led to treating brain injury as a localized damage rather than individual response to ever-changing impairment and symptoms, focusing primarily on clinical recovery until recently. While self-tracking and management technologies have been widely successful in measuring individual symptoms, they have struggled to facilitate sensemaking and problem solving to achieve a consistent biopsychosocial awareness of illness. My dissertation addresses this gap through three aims: (1) investigate the current informational practices of individuals undergoing post-acute brain injury recovery, (2) explore technology-agnostic approaches for data collection and their impact on sensemaking processes and conceptual understanding of brain injury, and (3) develop guidelines for designing data collection tools that facilitate sensemaking in brain injury self-management. I achieve this through two longitudinal studies – an interview study that introduced participants to the framework on quality of life after traumatic brain injury (QoLIBRI) and a narrative study that used QoLIBRI framework to do structured journaling and co-design individualized data collection tools. The goal of this work is to improve self-awareness of individuals with brain injury enabling them to anticipate or recognize the occurrence of a challenge caused by impairment and then, utilize assistive technologies to bypass the limitation. It also has implications for involving neurodiverse populations in research and technology design.

Browse

Recent Submissions