Informatics School Theses and Dissertations

Permanent URI for this collection

Please go to "Informatics Graduate Theses and PhD Dissertations" to submit dissertations and theses for the School of Informatics and Computing, at: http://hdl.handle.net/1805/303.

Browse

Recent Submissions

Now showing 1 - 10 of 195
  • Item
    Multi-omics Investigation into Alzheimer's Disease: Functional Mechanism and Early Detection
    (2024-08) Pugalenthi, Pradeep Varathan; Yan, Jingwen; Janga, Sarath Chandra; Nho, Kwangsik; Wang, Juexin
    Alzheimer’s disease (AD), a multi-factorial and highly heritable condition, stands as the foremost contributor to dementia. Despite its early discovery and extensive studies, the underlying pathogenesis of AD remains incomplete. This thesis addresses critical aspects of AD through multi-omics approach for improved understanding of underlying functional mechanisms and for improved precision in early detection. Multi-omics integration allows us to explore a wide spectrum of AD-related changes at different biological levels including genomics and metabolomics and how they associate with the biomarkers. In the first aim, I performed an integrative analysis of summary statistics from genome-wide association study (GWAS) and expression quantitative trait loci (eQTL) analysis. Results of this study confirmed the potential of integrative GWAS and eQTL analysis in estimating the transcriptomic changes when lack of tissue-specific expression data, and provided important insights into tissue-specific downstream biology of observed GWAS associations in AD. In the second aim, I took a step further and hypothesized the epistatic effect of GWAS findings and neighboring variants on the downstream functional mechanism. Leveraging the recent advances in sequence-based genome annotation, I investigated the tissue-specific effects of top AD GWAS variants on the chromatin profiles. With in-silico mutagenesis, GWAS variants were found to function via either lead effect or epistatic effect, pinpointing the limitation of existing focus on single-variant-based function annotation. In the last aim, I built a comprehensive bioinformatics pipeline to investigate the potential of metabolic age as an early indicator for AD progression, in which we also observed significant difference between sex groups. We identified strong associations of metabolic age with longitudinal changes of current diagnostic metrics in the ATN framework, suggesting the potential of metabolic age as early biomarkers. Collectively, results from these aims contribute to advancing our understanding of AD and provide valuable insights for future research and clinical applications.
  • Item
    A Model of Project Continuation in Game Jams and Hackathons
    (2024-08) Faas, Travis Byron; Miller, Andrew; Dombrowski, Lynn; Brady, Erin; Hickey, Daniel
    Game jams and hackathons are events where individuals design and build new technology prototypes in a short timeframe. Prototypes made at hackathons are often abandoned after the event and are never finished or used by their intended audiences. Though continued work on prototypes is not the only goal of hackathons, many expect that some hackathon projects will continue to be developed to fulfill the civic, educational, or entrepreneurial goals of hackathon organizers and attendees. To assist hackathon organizers in running hackathons that produce continued projects, I present in this document a model of project continuation in online hackathons and a tool that directs conversations that develops the necessary components of continuation. This model was developed through three studies: a design study that generated the design for a bot to be used in an online game jam that directs users in breaking the boundedness of their game concept, a deployment study where the bot was deployed and used in an online game jam, and a longitudinal study that followed the continuation practices of individuals who used the bot during the jam. In the presented continuation model, I highlight how recent personal interests generate an extended development context that reduces the boundedness of game jams and show how regular sharing and discussion of progress creates social investment in the success of projects that contributes to continuation intention and support. This continuation model requires a resting period post-hackathon, which sometimes generates conceptual continuation where a project is abandoned but the major project concepts are explored in later projects. Taking this idea of concept continuation further, I offer suggestions on how to gain continuation in hackathons by reducing their time-boundedness and making the events more permeable to allow for prior-existing projects to be accepted and further developed at these events.
  • Item
    Elucidating Chemotherapy Resistance in Breast Cancer Through Advanced Subpathway Analysis Algorithm: A Novel Approach to Topological Interpretation of Transcriptomic Data
    (2024-08) Huo, Yang; Yan, Jingwen; Li, Lang; Zhang, Chi; Zhang, Pengyue; Wang, Juexin
    Chemotherapy resistance in breast cancer, particularly Triple-Negative Breast Cancer (TNBC), poses a significant challenge to effective treatment, contributing to high mortality rates. This thesis investigates the molecular mechanisms underlying chemoresistance, focusing on enhancing the granularity of pathway analysis through an innovative sub-pathway analysis algorithm. Traditional pathway analyses, while providing fundamental insights, often overlook the intricate and individual-specific nature of chemoresistance. The research introduces an advanced sub-pathway analysis algorithm that dissects larger pathways into smaller, more detailed sub-pathways, allowing for precise exploration of molecular interactions driving chemoresistance. The methodology involves a comparative analysis of transcriptome profiles from breast cancer patients before and after chemotherapy, utilizing both established and new sub-pathway analytical techniques. This integrative approach aims to uncover previously unrecognized mechanisms of resistance and identify potential biomarkers for chemoresistance. Furthermore, the thesis presents the development of a new algorithm, i-Subway, designed to conduct sub-pathway analyses at the individual sample level. This algorithm incorporates both inhibitory and inductive relationships within sub-pathways and integrates the empirical Bayes statistical model with the topological structure of the sub-pathway, significantly improving computational efficiency. When applied to transcriptomic data from 56 breast cancer cell lines, i-Subway revealed substantial variation at the sub-pathway level, providing deeper insights into the molecular basis of chemoresistance. Overall, this thesis aims to enhance the understanding of the specific pathways and sub-pathways altered in response to chemotherapy, offering new insights into the molecular mechanisms of chemoresistance in breast cancer. The findings are expected to facilitate the identification of novel therapeutic targets and contribute to the development of more effective, individualized treatment strategies.
  • Item
    Integrating Imaging and Genetics Data for Improved Understanding and Detection of Alzheimer's Disease
    (2024-08) He, Bing; Janga, Sarath Chandra; Saykin, Andrew J.; Yu, Meichen; Yan, Jingwen
    Alzheimer’s disease (AD) is a progressive and irreversible brain disorder characterized by a slow and intricate progression, in which the initial pathological changes occur long before noticeable symptoms. AD is highly heritable and genetic factors play an essential role in AD development. Large scale genome-wide association studies have identified numerous SNPs related to AD. However, our understanding of the connections between genetics findings and altered brain phenotype is still limited. Brain imaging genetics, an emerging approach, aims to investigate the relationship between genetic variations and brain structure or function. It has great potential to provide insights into the underlying biological mechanisms and to enable the early detection of AD. Our study aimed to develop and apply novel computational approaches for more robust discovery of imaging genetics associations and for improved detection of AD in early stage. Specifically, we focused on addressing the heterogeneity problem inherent in integrating imaging and genetics data. In aim 1, we applied a novel biclustering method to associate genetic variations with functional brain connectivity altered in AD patients. In aim 2, we proposed novel strategy to integrate imaging and genetic data to serve as a new type of prior knowledge and investigated their role in guiding imaging genetics association. Finally, in aim 3, we proposed a multi-factorial pseudotime approach to integrate heterogeneous genotype and amyloid imaging data and examined its potential for staging and early detection of AD. Collectively, results from these objectives aimed to enhance our understanding and detection of AD, providing valuable information to inform therapeutic strategies to slow or halt disease progression.
  • Item
    Techniques for Improving the Robustness of Visual Analytics
    (2024-08) Koonchanok, Ratanond; Reda, Khairi; Chakraborty, Sunandan; Cafaro, Francesco; McCabe, Sean
    Interactive visualization systems, such as Tableau, are integral parts of the data analysis workflow. While such tools were built to help analysts perform exploratory data analysis with minimal effort, analysts have also been using them to make statistical inferences (e.g., predicting future trends) based on patterns revealed by the dataset. However, in addition to revealing true patterns, visualizations can also surface noise and other random fluctuations in data, which could lead to spurious discoveries. The latter poses a threat to the trustworthiness of analyses, especially given the increased reliance on visualizations across various domains. My central thesis is that it is possible to reduce the incidence of false discovery by introducing lightweight user interface elements in visualization tools. In particular, I propose eliciting and incorporating analyst beliefs into visualizations as an approach for guarding against spurious patterns and reducing the risk of analysts “overfitting” the data. To study how analysts would respond to such intervention, I first designed an interactive tool that combined visual belief elicitation with traditional visualization functionalities. In a qualitative study with data analysts, the tool appeared to allow users to operationalize their working knowledge into analyses, nudging them to adopt normative analysis practices (e.g., specifying hypotheses before peeking at data). I then conducted a crowdsourced experiment to investigate if this design could indeed help reduce the incidence of false discovery. Compared to a control condition, participants who used our intervention made significantly more accurate inferences and reported fewer false discoveries. Lastly, I investigated the capability of human intuition by comparing inferences from participants against those generated by statistical machines to understand the advantages and limitations of each. Overall, my thesis paves the way toward the development of a robust visual analytics system that facilitates collaborative decision-making processes, leveraging the complementary abilities of humans and machines.
  • Item
    Predictive Molecular Biomarkers for Human Health Risk
    (2024-07) Jiang, Guanglong; Liu, Yunlong; Yan, Jingwen; Wan, Jun; Wang, Juexin
    Molecular biomarkers play vital roles in disease risk assessment, personalized treatment selection and therapy response monitoring. This thesis explores the use of diverse molecular biomarkers for the assessment of human health risks, primarily in cancers. MiRNAs and their isoforms (isomiR) are promising biomarker candidates due to their comprehensive regulation of gene expression and involvement in physiology and pathological processes. The first study demonstrates that genetic variations in miRNA precursor regions influence the biogenesis of isomiRs in 95 SNP-isomiR pairs. Notably, we identified a SNP (rs6505162) impacting hsa-miR-423 isomiRs, potentially linked to breast cancer pathogenesis, suggesting their potential as biomarkers in disease assessment. The findings also highlight the mechanism of genetic regulation of isomiR generation and advance our understanding of miRNA mediated post-transcriptional regulation. Secondly, we explored the predictive capacity of aberrant intron-retention neoantigen burden (INB) in predicting the response to immune checkpoint inhibitors (ICI) in metastatic cancers. Both INB and tumor mutation burden (TMB) were strong predictors of ICI therapy duration (p = 0.019 and 0.038, respectively), with patients exhibiting elevated levels demonstrating exceptional treatment duration. Patients with high INB or TMB had improved overall survival (OS) (p = 1.1×10-4). Importantly, INB and TMB were uncorrelated, indicating that they capture distinct aspects of tumor neoantigen. Together, the combined assessment of INB and TMB offers improved accuracy in predicting clinical response to ICI therapies. Finally, we extend the application of molecular biomarkers to the assessment of minimal residual disease for risk stratification in triple negative breast cancer (TNBC) with residual disease after neoadjuvant chemotherapy. Detection of circulating tumor DNA (ctDNA) was a significant predictor of inferior distant disease-free survival (DDFS) (p = 0.006), disease-free survival (DFS) (p = 0.009) and OS (p = 0.002). The combination of circulating tumor cell (CTC) and ctDNA markers provided superior sensitivity and prognostic value. In conclusion, the studies provide compelling evidence for the utility of diverse molecular biomarkers – including miRNA isoforms, abnormal splicing-based neoantigen metrics and circulating tumor DNA in disease prediction and treatment efficacy assessment. By elucidating the roles of diverse biomarkers in predicting cancer pathogenesis and therapeutic response, we pave the way towards more personalized and effective approaches to managing human health risks.
  • Item
    Computational Methods for Proteoform Identification and Characterization Using Top-Down Mass Spectrometry
    (2023-12) Chen, Wenrong; Yan, Jingwen; Wang, Juexin; Wan, Jun; Zang, Yong; Luo, Xiao; Liu, Xiaowen
    Proteoforms, distinct molecular forms of proteins, arise due to numerous factors such as genetic mutations, differential gene expression, alternative splicing, and a range of biological processes. These proteoforms are often characterized by primary structural variances such as amino acid substitutions, terminal truncations, and post-translational modifications (PTMs). Proteoforms from the same proteins can manifest varied functional behaviors based on the specific alterations. The complexity inherent to proteoforms has elevated the significance of top-down mass spectrometry (MS) due to its proficiency in providing intricate sequence information for these intact proteoforms. During a typical top-down MS experiment, intact proteoforms are separated through platforms like liquid chromatography (LC) or capillary zone electrophoresis (CZE) prior to tandem mass spectrometry (MS/MS) analysis. Despite advancements in instruments and protocols for top-down MS, computational challenges persist, with software tool development still in its early stage. In this dissertation, our research revolves around three primary goals, all aimed at refining proteoform characterization. First, we bridge RNA-Seq with top-down MS for a better proteoform identification. We propose TopPG, an innovative proteogenomic tool which is tailored to generate proteoform sequence databases from genetic and splicing variations explicitly for top-down MS in contrast to traditional approaches. Second, to boost the accuracy of proteoform detection, we utilize machine learning methods to predict proteoform retention and migration times in top-down MS, an area previously overshadowed by bottom-up MS paradigms. critically evaluating models in a realm traditionally dominated by bottom-up MS methodologies. Lastly, recognizing the indispensable role of post-translational modifications (PTMs) on cellular functions, we introduce PTM-TBA. This tool integrates the complementary strengths of both top-down and bottom-up MS, augmented with annotations, building a comprehensive strategy for precise PTM identification and localization.
  • Item
    Survivor-Centered Transformative Justice: An Approach to Designing Sociotechnical Systems Alongside Domestic Violence Stakeholders in US Muslim Communities
    (2023-08) Rabaan, Hawra; Dombrowski, Lynn; Bolchini, Davide; Brady, Erin; Khaja, Khadija; Schoenebeck, Sarita
    Domestic violence (DV) is a social, political, and legal problem that requires contextual examination. In the United States, earlier advocacy work focused on law reform to empower survivors in influencing the public and state to take DV seriously and provide resources to support and protect survivors. However, harm is still perpetuated systemically and socially for survivors, especially those from racial and religious minorities. In this dissertation, I focus on domestic violence within the US-based Muslim population due to the unique issues Muslim survivors face when dealing with governmental services and service providers (e.g., gendered Islamophobia, racial discrimination, punitive actions) and within the Muslim community itself (e.g., community trauma, faith leaders lacking appropriate training). This work incorporates three phases of research that utilize qualitative and design methods to examine the forms and dynamics of domestic violence, help-seeking and healing challenges, and survivor advocacy, abuser accountability, and community transformation interventions. I argue that to pursue justice for survivors in design research, a multifaceted approach rooted in principles from Islamic feminism, traumainformed care, and restorative and transformative justice tenets is needed. Consequently, I propose Survivor-Centered Transformative Justice (SCTJ), a framework to discern individual and systemic harm, to understand how to design alongside victim-survivors, and to focus on victim-survivors' autonomy. I illustrate how SCTJ allows researchers and designers to account for individual inequalities, recognize communities' preferred approaches to pursuing justice, tackle the underlying conditions enabling harm, and provide interventions that alter, repair, and reduce harm within different scales of relationships. Additionally, I present the concept of healing structures, which aim to safeguard against harmful community practices, discriminatory laws, and practices while facilitating collective and survivor-centered interventions to promote healing. Lastly, I demonstrate the potential for design research to progress by taking a closer look into the belief systems, cultural values, and surrounding conditions that contribute to users' obtainable choices and decision-making processes, and by centering the needs of people at the margins. With this empirical, theoretical, and design work, I present insights that inform the HCI community at the intersection of social justiceoriented design, Islamic feminism, and gender-based violence.
  • Item
    Understanding Informational Practices and Exploring Data Collection Approaches for Quality of Life in Brain Injury Illness Management
    (2023-07) Masterson, Yamini Lalama Patnaik; Brady, Erin; Miller, Andrew D.; Toscos, Tammy; Hong, Youngbok; Gunter, Tracy D.
    Brain injury, a combination of medical injury, chronic illness, and impairment, affects more than 3.5 million people in the United States every year through an interplay of physiological, psychological, environmental, and cultural factors spanning clinical recovery, illness management, and personal recovery phases. The lack of collaborative and integrated understanding from healthcare and accessibility communities led to treating brain injury as a localized damage rather than individual response to ever-changing impairment and symptoms, focusing primarily on clinical recovery until recently. While self-tracking and management technologies have been widely successful in measuring individual symptoms, they have struggled to facilitate sensemaking and problem solving to achieve a consistent biopsychosocial awareness of illness. My dissertation addresses this gap through three aims: (1) investigate the current informational practices of individuals undergoing post-acute brain injury recovery, (2) explore technology-agnostic approaches for data collection and their impact on sensemaking processes and conceptual understanding of brain injury, and (3) develop guidelines for designing data collection tools that facilitate sensemaking in brain injury self-management. I achieve this through two longitudinal studies – an interview study that introduced participants to the framework on quality of life after traumatic brain injury (QoLIBRI) and a narrative study that used QoLIBRI framework to do structured journaling and co-design individualized data collection tools. The goal of this work is to improve self-awareness of individuals with brain injury enabling them to anticipate or recognize the occurrence of a challenge caused by impairment and then, utilize assistive technologies to bypass the limitation. It also has implications for involving neurodiverse populations in research and technology design.
  • Item
    Celltyper: A Single-Cell Sequencing Marker Gene Tool Suite
    (2023-05) Paisley, Brianna Meadow; Liu, Yunlong; Yan, Jingwen; Cao, Sha; Wang, Juexin; Carfagna, Mark
    Single-cell RNA-sequencing (scRNA-seq) has enabled researchers to study interindividual cellular heterogeneity, to explore disease impact on cellular composition of tissue, and to identify novel cell subtypes. However, a major challenge in scRNA-seq analysis is to identify the cell type of individual cells. Accurate cell type identification is crucial for any scRNA-seq analysis to be valid as incorrect cell type assignment will reduce statistical robustness and may lead to incorrect biological conclusions. Therefore, accurate and comprehensive cell type assignment is necessary for reliable biological insights into scRNA-seq datasets. With over 200 distinct cell types in humans alone, the concept of cell identity is large. Even within the same cell type there exists heterogeneity due to cell cycle phase, cell state, cell subtypes, cell health and the tissue microenvironment. This makes cell type classification a complicated biological problem requiring bioinformatics. One approach to classify cell type identity is using marker genes. Marker genes are genes specific for one or a few cell types. When coupled with bioinformatic methods, marker genes show promise of improving cell type classification. However, current scRNA-seq classification methods and databases use marker genes that are non-specific across sources, samples, and/or species leading to bias and errors. Furthermore, many existing tools require manual intervention by the user to provide training datasets or the expected number and name of cell types, which can introduce selection bias. The selection bias negatively impacts the accuracy of cell type classification methods as the model cannot extrapolate outside of the user inputs even when it is biologically meaningful to do so. In this dissertation I developed CellTypeR, a suite of tools to explore the biology governing cell identity in a “normal” state for humans and mice. The work presented here accomplishes three aims: 1. Develop an ontology standardized database of published marker gene literature; 2. Develop and apply a marker gene classification algorithm; and 3. Create user interface and input data structure for scRNA-seq cell type prediction.