IU Indianapolis ScholarWorks :: Browsing by Author "Kasthurirathne, Suranga N."

Browsing by Author "Kasthurirathne, Suranga N."

Now showing 1 - 10 of 18

A framework for a consistent and reproducible evaluation of manual review for patient matching algorithms
(Oxford University Press, 2022) Gupta, Agrayan K.; Kasthurirathne, Suranga N.; Xu, Huiping; Li, Xiaochun; Ruppert, Matthew M.; Harle, Christopher A.; Grannis, Shaun J.; Medicine, School of Medicine
Healthcare systems are hampered by incomplete and fragmented patient health records. Record linkage is widely accepted as a solution to improve the quality and completeness of patient records. However, there does not exist a systematic approach for manually reviewing patient records to create gold standard record linkage data sets. We propose a robust framework for creating and evaluating manually reviewed gold standard data sets for measuring the performance of patient matching algorithms. Our 8-point approach covers data preprocessing, blocking, record adjudication, linkage evaluation, and reviewer characteristics. This framework can help record linkage method developers provide necessary transparency when creating and validating gold standard reference matching data sets. In turn, this transparency will support both the internal and external validity of recording linkage studies and improve the robustness of new record linkage strategies.
An Adversorial Approach to Enable Re-Use of Machine Learning Models and Collaborative Research Efforts Using Synthetic Unstructured Free-Text Medical Data
(IOS, 2019) Kasthurirathne, Suranga N.; Dexter, Gregory; Grannis, Shaun J.; Epidemiology, School of Public Health
We leverage Generative Adversarial Networks (GAN) to produce synthetic free-text medical data with low re-identification risk, and apply these to replicate machine learning solutions. We trained GAN models to generate free-text cancer pathology reports. Decision models were trained using synthetic datasets reported performance metrics that were statistically similar to models trained using original test data. Our results further the use of GANs to generate synthetic data for collaborative research and re-use of machine learning models.
Assessing the capacity of social determinants of health data to augment predictive models identifying patients in need of wraparound social services
(Oxford Press, 2018-01) Kasthurirathne, Suranga N.; Vest, Joshua R.; Menachemi, Nir; Halverson, Paul K.; Grannis, Shaun J.; Health Policy and Management, School of Public Health
Introduction A growing variety of diverse data sources is emerging to better inform health care delivery and health outcomes. We sought to evaluate the capacity for clinical, socioeconomic, and public health data sources to predict the need for various social service referrals among patients at a safety-net hospital. Materials and Methods We integrated patient clinical data and community-level data representing patients’ social determinants of health (SDH) obtained from multiple sources to build random forest decision models to predict the need for any, mental health, dietitian, social work, or other SDH service referrals. To assess the impact of SDH on improving performance, we built separate decision models using clinical and SDH determinants and clinical data only. Results Decision models predicting the need for any, mental health, and dietitian referrals yielded sensitivity, specificity, and accuracy measures ranging between 60% and 75%. Specificity and accuracy scores for social work and other SDH services ranged between 67% and 77%, while sensitivity scores were between 50% and 63%. Area under the receiver operating characteristic curve values for the decision models ranged between 70% and 78%. Models for predicting the need for any services reported positive predictive values between 65% and 73%. Positive predictive values for predicting individual outcomes were below 40%. Discussion The need for various social service referrals can be predicted with considerable accuracy using a wide range of readily available clinical and community data that measure socioeconomic and public health conditions. While the use of SDH did not result in significant performance improvements, our approach represents a novel and important application of risk predictive modeling.
Comparison of Supervised Machine Learning and Probabilistic Approaches for Record Linkage
(AMIA Informatics summit 2019 Conference Proceedings., 2020-03-25) McNutt, Andrew T.; Grannis, Shaun J.; Bo, Na; Xu, Huiping; Kasthurirathne, Suranga N.
Record linkage is vital to prevent fragmentation of patient data. Machine learning approaches present considerable potential for record linkage. We compared the performance of three machine learning algorithms to an established probabilistic record linkage technique. Machine learning approaches exhibited results that were comparable, or statistically superior to the established probabilistic approach. It is unclear if the cost of manually reviewing datasets for supervised learning is justified by the performance improvements they yield.
Development and validation of computable social phenotypes for health-related social needs
(Oxford University Press, 2025-01-07) Gregory, Megan E.; Kasthurirathne, Suranga N.; Magoc, Tanja; McNamee, Cassidy; Harle, Christopher A.; Vest, Joshua R.; Health Policy and Management, Richard M. Fairbanks School of Public Health
Objective: Measurement of health-related social needs (HRSNs) is complex. We sought to develop and validate computable phenotypes (CPs) using structured electronic health record (EHR) data for food insecurity, housing instability, financial insecurity, transportation barriers, and a composite-type measure of these, using human-defined rule-based and machine learning (ML) classifier approaches. Materials and methods: We collected HRSN surveys as the reference standard and obtained EHR data from 1550 patients in 3 health systems from 2 states. We followed a Delphi-like approach to develop the human-defined rule-based CP. For the ML classifier approach, we trained supervised ML (XGBoost) models using 78 features. Using surveys as the reference standard, we calculated sensitivity, specificity, positive predictive values, and area under the curve (AUC). We compared AUCs using the Delong test and other performance measures using McNemar's test, and checked for differential performance. Results: Most patients (63%) reported at least one HRSN on the reference standard survey. Human-defined rule-based CPs exhibited poor performance (AUCs=.52 to .68). ML classifier CPs performed significantly better, but still poor-to-fair (AUCs = .68 to .75). Significant differences for race/ethnicity were found for ML classifier CPs (higher AUCs for White non-Hispanic patients). Important features included number of encounters and Medicaid insurance. Discussion: Using a supervised ML classifier approach, HRSN CPs approached thresholds of fair performance, but exhibited differential performance by race/ethnicity. Conclusion: CPs may help to identify patients who may benefit from additional social needs screening. Future work should explore the use of area-level features via geospatial data and natural language processing to improve model performance.
Development of a FHIR Based Application Programming Interface for Aggregate-Level Social Determinants of Health
(AMIA Informatics summit 2019 Conference Proceedings, 2019-03-25) Kasthurirathne, Suranga N.; Cormer, Karen F.; Devadasan, Neil; Biondich, Paul G.
Evaluation of a Parsimonious COVID-19 Outbreak Prediction Model: Heuristic Modeling Approach Using Publicly Available Data Sets
(JMIR, 2021-07) Gupta, Agrayan K.; Grannis, Shaun J.; Kasthurirathne, Suranga N.; Family Medicine, School of Medicine
Background: The COVID-19 pandemic has changed public health policies and human and community behaviors through lockdowns and mandates. Governments are rapidly evolving policies to increase hospital capacity and supply personal protective equipment and other equipment to mitigate disease spread in affected regions. Current models that predict COVID-19 case counts and spread are complex by nature and offer limited explainability and generalizability. This has highlighted the need for accurate and robust outbreak prediction models that balance model parsimony and performance. Objective: We sought to leverage readily accessible data sets extracted from multiple states to train and evaluate a parsimonious predictive model capable of identifying county-level risk of COVID-19 outbreaks on a day-to-day basis. Methods: Our modeling approach leveraged the following data inputs: COVID-19 case counts per county per day and county populations. We developed an outbreak gold standard across California, Indiana, and Iowa. The model utilized a per capita running 7-day sum of the case counts per county per day and the mean cumulative case count to develop baseline values. The model was trained with data recorded between March 1 and August 31, 2020, and tested on data recorded between September 1 and October 31, 2020. Results: The model reported sensitivities of 81%, 92%, and 90% for California, Indiana, and Iowa, respectively. The precision in each state was above 85% while specificity and accuracy scores were generally >95%. Conclusions: Our parsimonious model provides a generalizable and simple alternative approach to outbreak prediction. This methodology can be applied to diverse regions to help state officials and hospitals with resource allocation and to guide risk management, community education, and mitigation strategies.
An Evaluation of Activity Trackers for Monitoring Parkinson's Disease Patient Outcomes
(2016) Jones, Josette F.; Wu, Huanmei; Patel, Jay; Kasthurirathne, Suranga N.; Binkhedar, Samar; Thai, Nicole; Mukherjee, Sunanda
Parkinson's disease (PD) is the second most common neurodegenerative disease in America. PD results in adverse outcomes including motor impairments and non-motor impairments such as cognition and sleep. Medication has a limited impact on treating PD and slowing down the progression of the disease. Anecdotal reports and some research show that intensive activity has been beneficial not only to slow down the progression of PD, but also may reverse the onset of more severe PD symptoms.
Generative Adversarial Networks for Creating Synthetic Free-Text Medical Data: A Proposal for Collaborative Research and Re-use of Machine Learning Models
(AMIA Informatics summit 2021 Conference Proceedings., 2021-03) Kasthurirathne, Suranga N.; Dexter, Gregory; Grannis, Shaun J.
Restrictions in sharing Patient Health Identifiers (PHI) limit cross-organizational re-use of free-text medical data. We leverage Generative Adversarial Networks (GAN) to produce synthetic unstructured free-text medical data with low re-identification risk, and assess the suitability of these datasets to replicate machine learning models. We trained GAN models using unstructured free-text laboratory messages pertaining to salmonella, and identified the most accurate models for creating synthetic datasets that reflect the informational characteristics of the original dataset. Natural Language Generation metrics comparing the real and synthetic datasets demonstrated high similarity. Decision models generated using these datasets reported high performance metrics. There was no statistically significant difference in performance measures reported by models trained using real and synthetic datasets. Our results inform the use of GAN models to generate synthetic unstructured free-text data with limited re-identification risk, and use of this data to enable collaborative research and re-use of machine learning models.
Identification of Patients in Need of Advanced Care for Depression Using Data Extracted From a Statewide Health Information Exchange: A Machine Learning Approach
(JMIR Publications, 2019-07-22) Kasthurirathne, Suranga N.; Biondich, Paul G.; Grannis, Shaun J.; Purkayastha, Saptarshi; Vest, Joshua R.; Jones, Josette F.; Epidemiology, School of Public Health
BACKGROUND: As the most commonly occurring form of mental illness worldwide, depression poses significant health and economic burdens to both the individual and community. Different types of depression pose different levels of risk. Individuals who suffer from mild forms of depression may recover without any assistance or be effectively managed by primary care or family practitioners. However, other forms of depression are far more severe and require advanced care by certified mental health providers. However, identifying cases of depression that require advanced care may be challenging to primary care providers and health care team members whose skill sets run broad rather than deep. OBJECTIVE: This study aimed to leverage a comprehensive range of patient-level diagnostic, behavioral, and demographic data, as well as past visit history data from a statewide health information exchange to build decision models capable of predicting the need of advanced care for depression across patients presenting at Eskenazi Health, the public safety net health system for Marion County, Indianapolis, Indiana. METHODS: Patient-level diagnostic, behavioral, demographic, and past visit history data extracted from structured datasets were merged with outcome variables extracted from unstructured free-text datasets and were used to train random forest decision models that predicted the need of advanced care for depression across (1) the overall patient population and (2) various subsets of patients at higher risk for depression-related adverse events; patients with a past diagnosis of depression; patients with a Charlson comorbidity index of ≥1; patients with a Charlson comorbidity index of ≥2; and all unique patients identified across the 3 above-mentioned high-risk groups. RESULTS: The overall patient population consisted of 84,317 adult (aged ≥18 years) patients. A total of 6992 (8.29%) of these patients were in need of advanced care for depression. Decision models for high-risk patient groups yielded area under the curve (AUC) scores between 86.31% and 94.43%. The decision model for the overall patient population yielded a comparatively lower AUC score of 78.87%. The variance of optimal sensitivity and specificity for all decision models, as identified using Youden J Index, is as follows: sensitivity=68.79% to 83.91% and specificity=76.03% to 92.18%. CONCLUSIONS: This study demonstrates the ability to automate screening for patients in need of advanced care for depression across (1) an overall patient population or (2) various high-risk patient groups using structured datasets covering acute and chronic conditions, patient demographics, behaviors, and past visit history. Furthermore, these results show considerable potential to enable preventative care and can be easily integrated into existing clinical workflows to improve access to wraparound health care services.

Browsing by Author "Kasthurirathne, Suranga N."

Results Per Page

Sort Options