- Browse by Author
Browsing by Author "Yang, Baijian"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
Item ARDaC Common Data Model Facilitates Data Dissemination and Enables Data Commons for Modern Clinical Studies(IOS Press, 2024) Jin, Nanxin; Li, Zuotian; Kettler, Carla; Yang, Baijian; Tu, Wanzhu; Su, Jing; Biostatistics and Health Data Science, Richard M. Fairbanks School of Public HealthModern clinical studies collect longitudinal and multimodal data about participants, treatments and responses, biospecimens, and molecular and multiomics data. Such rich and complex data requires new common data models (CDM) to support data dissemination and research collaboration. We have developed the ARDaC CDM for the Alcoholic Hepatitis Network (AlcHepNet) Research Data Commons (ARDaC) to support clinical studies and translational research in the national AlcHepNet consortium. The ARDaC CDM bridges the gap between the data models used by the AlcHepNet electronic data capture platform (REDCap) and the Genomic Data Commons (GDC) data model used by the Gen3 data commons framework. It extends the GDC data model for clinical studies; facilitates the harmonization of research data across consortia and programs; and supports the development of the ARDaC. ARDaC CDM is designed as a general and extensible CDM for addressing the needs of modern clinical studies. The ARDaC CDM is available at https://dev.ardac.org/DD.Item Feature Selection for Unsupervised Machine Learning(IEEE, 2023) Huang, Huyunting; Tang, Ziyang; Zhang, Tonglin; Yang, Baijian; Song, Qianqian; Su, Jing; Biostatistics and Health Data Science, Richard M. Fairbanks School of Public HealthCompared to supervised machine learning (ML), the development of feature selection for unsupervised ML is far behind. To address this issue, the current research proposes a stepwise feature selection approach for clustering methods with a specification to the Gaussian mixture model (GMM) and the k-means. Rather than the existing GMM and k-means which are carried out based on all the features, the proposed method selects a subset of features to implement the two methods, respectively. The research finds that a better result can be obtained if the existing GMM and k-means methods are modified by nice initializations. Experiments based on Monte Carlo simulations show that the proposed method is more computationally efficient and the result is more accurate than the existing GMM and k-means methods based on all the features. The experiment based on a real-world dataset confirms this finding.Item PINet: Privileged Information Improve the Interpretablity and generalization of structural MRI in Alzheimer’s Disease(Association for Computing Machinery, 2023) Tang, Zijia; Zhang, Tonglin; Song, Qianqian; Su, Jing; Yang, Baijian; Biostatistics and Health Data Science, Richard M. Fairbanks School of Public HealthThe irreversible and progressive atrophy by Alzheimer’s Disease resulted in continuous decline in thinking and behavioral skills. To date, CNN classifiers were widely applied to assist the early diagnosis of AD and its associated abnormal structures. However, most existing black-box CNN classifiers relied heavily on the limited MRI scans, and used little domain knowledge from the previous clinical findings. In this study, we proposed a framework, named as PINet, to consider the previous domain knowledge as a Privileged Information (PI), and open the black-box in the prediction process. The input domain knowledge guides the neural network to learn representative features and introduced intepretability for further analysis. PINet used a Transformer-like fusion module Privileged Information Fusion (PIF) to iteratively calculate the correlation of the features between image features and PI features, and project the features into a latent space for classification. The Pyramid Feature Visualization (PFV) module served as a verification to highlight the significant features on the input images. PINet was suitable for neuro-imaging tasks and we demonstrated its application in Alzheimer’s Disease using structural MRI scans from ADNI dataset. During the experiments, we employed the abnormal brain structures such as the Hippocampus as the PI, trained the model with the data from 1.5T scanners and tested from 3T scanners. The F1-score showed that PINet was more robust in transferring to a new dataset, with approximatedly 2% drop (from 0.9471 to 0.9231), while the baseline CNN methods had a 29% drop (from 0.8679 to 0.6154). The performance of PINet was relied on the selection of the domain knowledge as the PI. Our best model was trained under the guidance of 12 selected ROIs, major in the structures of Temporal Lobe and Occipital Lobe. In summary, PINet considered the domain knowledge as the PI to train the CNN model, and the selected PI introduced both interpretability and generalization ability to the black box CNN classifiers.Item SiGra: single-cell spatial elucidation through an image-augmented graph transformer(Springer Nature, 2023-09-12) Tang, Ziyang; Li, Zuotian; Hou, Tieying; Zhang, Tonglin; Yang, Baijian; Su, Jing; Song, Qianqian; Biostatistics and Health Data Science, School of MedicineRecent advances in high-throughput molecular imaging have pushed spatial transcriptomics technologies to subcellular resolution, which surpasses the limitations of both single-cell RNA-seq and array-based spatial profiling. The multichannel immunohistochemistry images in such data provide rich information on the cell types, functions, and morphologies of cellular compartments. In this work, we developed a method, single-cell spatial elucidation through image-augmented Graph transformer (SiGra), to leverage such imaging information for revealing spatial domains and enhancing substantially sparse and noisy transcriptomics data. SiGra applies hybrid graph transformers over a single-cell spatial graph. SiGra outperforms state-of-the-art methods on both single-cell and spot-level spatial transcriptomics data from complex tissues. The inclusion of immunohistochemistry images improves the model performance by 37% (95% CI: 27-50%). SiGra improves the characterization of intratumor heterogeneity and intercellular communication and recovers the known microscopic anatomy. Overall, SiGra effectively integrates different spatial modality data to gain deep insights into spatial cellular ecosystems.Item spaCI: deciphering spatial cellular communications through adaptive graph model(Oxford University Press, 2023) Tang, Ziyang; Zhang, Tonglin; Yang, Baijian; Su, Jing; Song, Qianqian; Biostatistics, School of Public HealthCell–cell communications are vital for biological signalling and play important roles in complex diseases. Recent advances in single-cell spatial transcriptomics (SCST) technologies allow examining the spatial cell communication landscapes and hold the promise for disentangling the complex ligand–receptor (L–R) interactions across cells. However, due to frequent dropout events and noisy signals in SCST data, it is challenging and lack of effective and tailored methods to accurately infer cellular communications. Herein, to decipher the cell-to-cell communications from SCST profiles, we propose a novel adaptive graph model with attention mechanisms named spaCI. spaCI incorporates both spatial locations and gene expression profiles of cells to identify the active L–R signalling axis across neighbouring cells. Through benchmarking with currently available methods, spaCI shows superior performance on both simulation data and real SCST datasets. Furthermore, spaCI is able to identify the upstream transcriptional factors mediating the active L–R interactions. For biological insights, we have applied spaCI to the seqFISH+ data of mouse cortex and the NanoString CosMx Spatial Molecular Imager (SMI) data of non-small cell lung cancer samples. spaCI reveals the hidden L–R interactions from the sparse seqFISH+ data, meanwhile identifies the inconspicuous L–R interactions including THBS1−ITGB1 between fibroblast and tumours in NanoString CosMx SMI data. spaCI further reveals that SMAD3 plays an important role in regulating the crosstalk between fibroblasts and tumours, which contributes to the prognosis of lung cancer patients. Collectively, spaCI addresses the challenges in interrogating SCST data for gaining insights into the underlying cellular communications, thus facilitates the discoveries of disease mechanisms, effective biomarkers and therapeutic targets.Item SpaRx: elucidate single-cell spatial heterogeneity of drug responses for personalized treatment(Oxford University Press, 2023) Tang, Ziyang; Liu, Xiang; Li, Zuotian; Zhang, Tonglin; Yang, Baijian; Su, Jing; Song, Qianqian; Biostatistics and Health Data Science, School of MedicineSpatial cellular authors heterogeneity contributes to differential drug responses in a tumor lesion and potential therapeutic resistance. Recent emerging spatial technologies such as CosMx, MERSCOPE and Xenium delineate the spatial gene expression patterns at the single cell resolution. This provides unprecedented opportunities to identify spatially localized cellular resistance and to optimize the treatment for individual patients. In this work, we present a graph-based domain adaptation model, SpaRx, to reveal the heterogeneity of spatial cellular response to drugs. SpaRx transfers the knowledge from pharmacogenomics profiles to single-cell spatial transcriptomics data, through hybrid learning with dynamic adversarial adaption. Comprehensive benchmarking demonstrates the superior and robust performance of SpaRx at different dropout rates, noise levels and transcriptomics coverage. Further application of SpaRx to the state-of-the-art single-cell spatial transcriptomics data reveals that tumor cells in different locations of a tumor lesion present heterogenous sensitivity or resistance to drugs. Moreover, resistant tumor cells interact with themselves or the surrounding constituents to form an ecosystem for drug resistance. Collectively, SpaRx characterizes the spatial therapeutic variability, unveils the molecular mechanisms underpinning drug resistance and identifies personalized drug targets and effective drug combinations.