Generative Adversarial Networks for Creating Synthetic Free-Text Medical Data: A Proposal for Collaborative Research and Re-use of Machine Learning Models

dc.contributor.authorKasthurirathne, Suranga N.
dc.contributor.authorDexter, Gregory
dc.contributor.authorGrannis, Shaun J.
dc.date.accessioned2021-06-22T13:37:14Z
dc.date.available2021-06-22T13:37:14Z
dc.date.issued2021-03
dc.description.abstractRestrictions in sharing Patient Health Identifiers (PHI) limit cross-organizational re-use of free-text medical data. We leverage Generative Adversarial Networks (GAN) to produce synthetic unstructured free-text medical data with low re-identification risk, and assess the suitability of these datasets to replicate machine learning models. We trained GAN models using unstructured free-text laboratory messages pertaining to salmonella, and identified the most accurate models for creating synthetic datasets that reflect the informational characteristics of the original dataset. Natural Language Generation metrics comparing the real and synthetic datasets demonstrated high similarity. Decision models generated using these datasets reported high performance metrics. There was no statistically significant difference in performance measures reported by models trained using real and synthetic datasets. Our results inform the use of GAN models to generate synthetic unstructured free-text data with limited re-identification risk, and use of this data to enable collaborative research and re-use of machine learning models.en_US
dc.identifier.citationKasthurirathne, S. N., Dexter, G., Grannis, S. J., (2021, March). Generative Adversarial Networks for Creating Synthetic Free-Text Medical Data: A Proposal for Collaborative Research and Re-use of Machine Learning Models. AMIA Informatics summit 2021 Conference Proceedings.en_US
dc.identifier.urihttps://hdl.handle.net/1805/26152
dc.language.isoen_USen_US
dc.publisherAMIA Informatics summit 2021 Conference Proceedings.en_US
dc.subjectMachine learningen_US
dc.subjectGenerative Adversarial Networksen_US
dc.subjectFree-text datasetsen_US
dc.subjectSynthetic dataen_US
dc.subjectPublic health reportingen_US
dc.titleGenerative Adversarial Networks for Creating Synthetic Free-Text Medical Data: A Proposal for Collaborative Research and Re-use of Machine Learning Modelsen_US
dc.typePresentationen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
gan.2021.paper.pdf
Size:
615.21 KB
Format:
Adobe Portable Document Format
Description:
Accepted Manuscript
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: