- Browse by Author
Browsing by Author "Magoc, Tanja"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Enhancing an enterprise data warehouse for research with data extracted using natural language processing(Cambridge University Press, 2023-06-13) Magoc, Tanja; Everson, Russell; Harle, Christopher A.; Health Policy and Management, School of Public HealthObjective: This study aims to develop a generalizable architecture for enhancing an enterprise data warehouse for research (EDW4R) with results from a natural language processing (NLP) model, which allows discrete data derived from clinical notes to be made broadly available for research use without need for NLP expertise. The study also quantifies the additional value that information extracted from clinical narratives brings to EDW4R. Materials and methods: Clinical notes written during one month at an academic health center were used to evaluate the performance of an existing NLP model and to quantify its value added to the structured data. Manual review was utilized for performance analysis. The architecture for enhancing the EDW4R is described in detail to enable reproducibility. Results: Two weeks were needed to enhance EDW4R with data from 250 million clinical notes. NLP generated 16 and 39% increase in data availability for two variables. Discussion: Our architecture is highly generalizable to a new NLP model. The positive predictive value obtained by an independent team showed only slightly lower NLP performance than the values reported by the NLP developers. The NLP showed significant value added to data already available in structured format. Conclusion: Given the value added by data extracted using NLP, it is important to enhance EDW4R with these data to enable research teams without NLP expertise to benefit from value added by NLP models.Item Evaluation of federated learning variations for COVID-19 diagnosis using chest radiographs from 42 US and European hospitals(Oxford University Press, 2022) Peng, Le; Luo, Gaoxiang; Walker, Andrew; Zaiman, Zachary; Jones, Emma K.; Gupta, Hemant; Kersten, Kristopher; Burns, John L.; Harle, Christopher A.; Magoc, Tanja; Shickel, Benjamin; Steenburg, Scott D.; Loftus, Tyler; Melton, Genevieve B.; Wawira Gichoya, Judy; Sun, Ju; Tignanelli, Christopher J.; Radiology and Imaging Sciences, School of MedicineObjective: Federated learning (FL) allows multiple distributed data holders to collaboratively learn a shared model without data sharing. However, individual health system data are heterogeneous. "Personalized" FL variations have been developed to counter data heterogeneity, but few have been evaluated using real-world healthcare data. The purpose of this study is to investigate the performance of a single-site versus a 3-client federated model using a previously described Coronavirus Disease 19 (COVID-19) diagnostic model. Additionally, to investigate the effect of system heterogeneity, we evaluate the performance of 4 FL variations. Materials and methods: We leverage a FL healthcare collaborative including data from 5 international healthcare systems (US and Europe) encompassing 42 hospitals. We implemented a COVID-19 computer vision diagnosis system using the Federated Averaging (FedAvg) algorithm implemented on Clara Train SDK 4.0. To study the effect of data heterogeneity, training data was pooled from 3 systems locally and federation was simulated. We compared a centralized/pooled model, versus FedAvg, and 3 personalized FL variations (FedProx, FedBN, and FedAMP). Results: We observed comparable model performance with respect to internal validation (local model: AUROC 0.94 vs FedAvg: 0.95, P = .5) and improved model generalizability with the FedAvg model (P < .05). When investigating the effects of model heterogeneity, we observed poor performance with FedAvg on internal validation as compared to personalized FL algorithms. FedAvg did have improved generalizability compared to personalized FL algorithms. On average, FedBN had the best rank performance on internal and external validation. Conclusion: FedAvg can significantly improve the generalization of the model compared to other personalization FL algorithms; however, at the cost of poor internal validity. Personalized FL may offer an opportunity to develop both internal and externally validated algorithms.Item Evolving availability and standardization of patient attributes for matching(Oxford University Press, 2023-10-12) Deng, Yu; Gleason, Lacey P.; Culbertson, Adam; Chen, Xiaotian; Bernstam, Elmer V.; Cullen, Theresa; Gouripeddi, Ramkiran; Harle, Christopher; Hesse, David F.; Kean, Jacob; Lee, John; Magoc, Tanja; Meeker, Daniella; Ong, Toan; Pathak, Jyotishman; Rosenman, Marc; Rusie, Laura K.; Shah, Akash J.; Shi, Lizheng; Thomas, Aaron; Trick, William E.; Grannis, Shaun; Kho, Abel; Health Policy and Management, School of Public HealthVariation in availability, format, and standardization of patient attributes across health care organizations impacts patient-matching performance. We report on the changing nature of patient-matching features available from 2010-2020 across diverse care settings. We asked 38 health care provider organizations about their current patient attribute data-collection practices. All sites collected name, date of birth (DOB), address, and phone number. Name, DOB, current address, social security number (SSN), sex, and phone number were most commonly used for cross-provider patient matching. Electronic health record queries for a subset of 20 participating sites revealed that DOB, first name, last name, city, and postal codes were highly available (>90%) across health care organizations and time. SSN declined slightly in the last years of the study period. Birth sex, gender identity, language, country full name, country abbreviation, health insurance number, ethnicity, cell phone number, email address, and weight increased over 50% from 2010 to 2020. Understanding the wide variation in available patient attributes across care settings in the United States can guide selection and standardization efforts for improved patient matching in the United States.