- Browse by Author
Browsing by Author "Amor, Benjamin"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Privacy‐preserving record linkage across disparate institutions and datasets to enable a learning health system: The national COVID cohort collaborative (N3C) experience(Wiley, 2024-01-11) Tachinardi, Umberto; Grannis, Shaun J.; Michael, Sam G.; Misquitta, Leonie; Dahlin, Jayme; Sheikh, Usman; Kho, Abel; Phua, Jasmin; Rogovin, Sara S.; Amor, Benjamin; Choudhury, Maya; Sparks, Philip; Mannaa, Amin; Ljazouli, Saad; Saltz, Joel; Prior, Fred; Baghal, Ahmen; Gersing, Kenneth; Embi, Peter J.; Medicine, School of MedicineIntroduction: Research driven by real-world clinical data is increasingly vital to enabling learning health systems, but integrating such data from across disparate health systems is challenging. As part of the NCATS National COVID Cohort Collaborative (N3C), the N3C Data Enclave was established as a centralized repository of deidentified and harmonized COVID-19 patient data from institutions across the US. However, making this data most useful for research requires linking it with information such as mortality data, images, and viral variants. The objective of this project was to establish privacy-preserving record linkage (PPRL) methods to ensure that patient-level EHR data remains secure and private when governance-approved linkages with other datasets occur. Methods: Separate agreements and approval processes govern N3C data contribution and data access. The Linkage Honest Broker (LHB), an independent neutral party (the Regenstrief Institute), ensures data linkages are robust and secure by adding an extra layer of separation between protected health information and clinical data. The LHB's PPRL methods (including algorithms, processes, and governance) match patient records using "deidentified tokens," which are hashed combinations of identifier fields that define a match across data repositories without using patients' clear-text identifiers. Results: These methods enable three linkage functions: Deduplication, Linking Multiple Datasets, and Cohort Discovery. To date, two external repositories have been cross-linked. As of March 1, 2023, 43 sites have signed the LHB Agreement; 35 sites have sent tokens generated for 9 528 998 patients. In this initial cohort, the LHB identified 135 037 matches and 68 596 duplicates. Conclusion: This large-scale linkage study using deidentified datasets of varying characteristics established secure methods for protecting the privacy of N3C patient data when linked for research purposes. This technology has potential for use with registries for other diseases and conditions.Item The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment(Oxford University Press, 2021) Haendel, Melissa A.; Chute, Christopher G.; Bennett, Tellen D.; Eichmann, David A.; Guinney, Justin; Kibbe, Warren A.; Payne, Philip R. O.; Pfaff, Emily R.; Robinson, Peter N.; Saltz, Joel H.; Spratt, Heidi; Suver, Christine; Wilbanks, John; Wilcox, Adam B.; Williams, Andrew E.; Wu, Chunlei; Blacketer, Clair; Bradford, Robert L.; Cimino, James J.; Clark, Marshall; Colmenares, Evan W.; Francis, Patricia A.; Gabriel, Davera; Graves, Alexis; Hemadri, Raju; Hong, Stephanie S.; Hripscak, George; Jiao, Dazhi; Klann, Jeffrey G.; Kostka, Kristin; Lee, Adam M.; Lehmann, Harold P.; Lingrey, Lora; Miller, Robert T.; Morris, Michele; Murphy, Shawn N.; Natarajan, Karthik; Palchuk, Matvey B.; Sheikh, Usman; Solbrig, Harold; Visweswaran, Shyam; Walden, Anita; Walters, Kellie M.; Weber, Griffin M.; Zhang, Xiaohan Tanner; Zhu, Richard L.; Amor, Benjamin; Girvin, Andrew T.; Manna, Amin; Qureshi, Nabeel; Kurilla, Michael G.; Michael, Sam G.; Portilla, Lili M.; Rutter, Joni L.; Austin, Christopher P.; Gersing, Ken R.; Biomedical Engineering and Informatics, Luddy School of Informatics, Computing, and EngineeringObjective: Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. Materials and methods: The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. Results: Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. Conclusions: The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19.