A simple two-step procedure using the Fellegi-Sunter model for frequency-based record linkage

dc.contributor.authorXu, Huiping
dc.contributor.authorLi, Xiaochun
dc.contributor.authorGrannis, Shaun
dc.contributor.departmentBiostatistics, School of Public Health
dc.date.accessioned2023-08-02T14:31:42Z
dc.date.available2023-08-02T14:31:42Z
dc.date.issued2021-05-04
dc.description.abstractThe widely used Fellegi-Sunter model for probabilistic record linkage does not leverage information contained in field values and consequently leads to identical classification of match status regardless of whether records agree on rare or common values. Since agreement on rare values is less likely to occur by chance than agreement on common values, records agreeing on rare values are more likely to be matches. Existing frequency-based methods typically rely on knowledge of error probabilities associated with field values and frequencies of agreed field values among matches, often derived using prior studies or training data. When such information is unavailable, applications of these methods are challenging. In this paper, we propose a simple two-step procedure for frequency-based matching using the Fellegi-Sunter framework to overcome these challenges. Matching weights are adjusted based on frequency distributions of the agreed field values among matches and non-matches, estimated by the Fellegi-Sunter model without relying on prior studies or training data. Through a real-world application and simulation, our method is found to produce comparable or better performance than the unadjusted method. Furthermore, frequency-based matching provides greater improvement in matching accuracy when using poorly discriminating fields with diminished benefit as the discriminating power of matching fields increases.
dc.eprint.versionFinal published version
dc.identifier.citationXu H, Li X, Grannis S. A simple two-step procedure using the Fellegi-Sunter model for frequency-based record linkage. J Appl Stat. 2021;49(11):2789-2804. Published 2021 May 4. doi:10.1080/02664763.2021.1922615
dc.identifier.urihttps://hdl.handle.net/1805/34684
dc.language.isoen_US
dc.publisherTaylor & Francis
dc.relation.isversionof10.1080/02664763.2021.1922615
dc.relation.journalJournal of Applied Statistics
dc.rightsPublisher Policy
dc.sourcePMC
dc.subjectFellegi–Sunter model
dc.subjectFrequency-based matching
dc.subjectLatent class analysis
dc.subjectProbabilistic matching
dc.subjectRecord linkage
dc.titleA simple two-step procedure using the Fellegi-Sunter model for frequency-based record linkage
dc.typeArticle
ul.alternative.fulltexthttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9336505/
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
CJAS_49_1922615.pdf
Size:
1.43 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: