- Browse by Subject
Browsing by Subject "Statistical models"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item A practical approach for incorporating dependence among fields in probabilistic record linkage(Springer Nature, 2013-08-30) Daggy, Joanne K.; Xu, Huiping; Hui, Siu L.; Gamache, Roland E.; Grannis, Shaun J.; Biostatistics and Health Data Science, Richard M. Fairbanks School of Public HealthBackground: Methods for linking real-world healthcare data often use a latent class model, where the latent, or unknown, class is the true match status of candidate record-pairs. This commonly used model assumes that agreement patterns among multiple fields within a latent class are independent. When this assumption is violated, various approaches, including the most commonly proposed loglinear models, have been suggested to account for conditional dependence. Methods: We present a step-by-step guide to identify important dependencies between fields through a correlation residual plot and demonstrate how they can be incorporated into loglinear models for record linkage. This method is applied to healthcare data from the patient registry for a large county health department. Results: Our method could be readily implemented using standard software (with code supplied) to produce an overall better model fit as measured by BIC and deviance. Finding the most parsimonious model is known to reduce bias in parameter estimates. Conclusions: This novel approach identifies and accommodates conditional dependence in the context of record linkage. The conditional dependence model is recommended for routine use due to its flexibility for incorporating conditional dependence and easy implementation using existing software.Item Identification of colorectal cancer using structured and free text clinical data(Sage, 2022) Redd, Douglas F.; Shao, Yijun; Zeng-Treitler, Qing; Myers, Laura J.; Barker, Barry C.; Nelson, Stuart J.; Imperiale, Thomas F.; Medicine, School of MedicineColorectal cancer incidence has continually fallen among those 50 years old and over. However, the incidence has increased in those under 50. Even with the recent screening guidelines recommending that screening begins at age 45, nearly half of all early-onset colorectal cancer will be missed. Methods are needed to identify high-risk individuals in this age group for targeted screening. Colorectal cancer studies, as with other clinical studies, have required labor intensive chart review for the identification of those affected and risk factors. Natural language processing and machine learning can be used to automate the process and enable the screening of large numbers of patients. This study developed and compared four machine learning and statistical models: logistic regression, support vector machine, random forest, and deep neural network, in their performance in classifying colorectal cancer patients. Excellent classification performance is achieved with AUCs over 97%.Item Research Needs for Prognostic Modeling and Trajectory Analysis in Patients with Disorders of Consciousness(Springer, 2021) Hammond, Flora M.; Katta-Charles, Sheryl; Russell, Mary Beth; Zafonte, Ross D.; Claassen, Jan; Wagner, Amy K.; Puybasset, Louis; Egawa, Satoshi; Laureys, Steven; Diringer, Michael; Stevens, Robert D.; Curing Coma Campaign and its Contributing Members; Physical Medicine and Rehabilitation, School of MedicineBackground: The current state of the science regarding the care and prognosis of patients with disorders of consciousness is limited. Scientific advances are needed to improve the accuracy, relevance, and approach to prognostication, thereby providing the foundation to develop meaningful and effective interventions. Methods: To address this need, an interdisciplinary expert panel was created as part of the Coma Science Working Group of the Neurocritical Care Society Curing Coma Campaign. Results: The panel performed a gap analysis which identified seven research needs for prognostic modeling and trajectory analysis ("recovery science") in patients with disorders of consciousness: (1) to define the variables that predict outcomes; (2) to define meaningful intermediate outcomes at specific time points for different endotypes; (3) to describe recovery trajectories in the absence of limitations to care; (4) to harness big data and develop analytic methods to prognosticate more accurately; (5) to identify key elements and processes for communicating prognostic uncertainty over time; (6) to identify health care delivery models that facilitate recovery and recovery science; and (7) to advocate for changes in the health care delivery system needed to advance recovery science and implement already-known best practices. Conclusion: This report summarizes the current research available to inform the proposed research needs, articulates key elements within each area, and discusses the goals and advances in recovery science and care anticipated by successfully addressing these needs.Item Spectral probabilities of top-down tandem mass spectra(Springer Nature, 2014) Liu, Xiaowen; Segar, Matthew W.; Li, Shuai Cheng; Kim, Sangtae; Biomedical Engineering and Informatics, Luddy School of Informatics, Computing, and EngineeringBackground: In mass spectrometry-based proteomics, the statistical significance of a peptide-spectrum or protein-spectrum match is an important indicator of the correctness of the peptide or protein identification. In bottom-up mass spectrometry, probabilistic models, such as the generating function method, have been successfully applied to compute the statistical significance of peptide-spectrum matches for short peptides containing no post-translational modifications. As top-down mass spectrometry, which often identifies intact proteins with post-translational modifications, becomes available in many laboratories, the estimation of statistical significance of top-down protein identification results has come into great demand. Results: In this paper, we study an extended generating function method for accurately computing the statistical significance of protein-spectrum matches with post-translational modifications. Experiments show that the extended generating function method achieves high accuracy in computing spectral probabilities and false discovery rates. Conclusions: The extended generating function method is a non-trivial extension of the generating function method for bottom-up mass spectrometry. It can be used to choose the correct protein-spectrum match from several candidate protein-spectrum matches for a spectrum, as well as separate correct protein-spectrum matches from incorrect ones identified from a large number of tandem mass spectra.