A simple two-step procedure using the Fellegi-Sunter model for frequency-based record linkage

If you need an accessible version of this item, please email your request to digschol@iu.edu so that they may create one and provide it to you.
Date
2021-05-04
Language
American English
Embargo Lift Date
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Taylor & Francis
Abstract

The widely used Fellegi-Sunter model for probabilistic record linkage does not leverage information contained in field values and consequently leads to identical classification of match status regardless of whether records agree on rare or common values. Since agreement on rare values is less likely to occur by chance than agreement on common values, records agreeing on rare values are more likely to be matches. Existing frequency-based methods typically rely on knowledge of error probabilities associated with field values and frequencies of agreed field values among matches, often derived using prior studies or training data. When such information is unavailable, applications of these methods are challenging. In this paper, we propose a simple two-step procedure for frequency-based matching using the Fellegi-Sunter framework to overcome these challenges. Matching weights are adjusted based on frequency distributions of the agreed field values among matches and non-matches, estimated by the Fellegi-Sunter model without relying on prior studies or training data. Through a real-world application and simulation, our method is found to produce comparable or better performance than the unadjusted method. Furthermore, frequency-based matching provides greater improvement in matching accuracy when using poorly discriminating fields with diminished benefit as the discriminating power of matching fields increases.

Description
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
Xu H, Li X, Grannis S. A simple two-step procedure using the Fellegi-Sunter model for frequency-based record linkage. J Appl Stat. 2021;49(11):2789-2804. Published 2021 May 4. doi:10.1080/02664763.2021.1922615
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Journal of Applied Statistics
Source
PMC
Alternative Title
Type
Article
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Final published version
This item is under embargo {{howLong}}