- Browse by Subject
Browsing by Subject "Fellegi–Sunter model"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item A simple two-step procedure using the Fellegi-Sunter model for frequency-based record linkage(Taylor & Francis, 2021-05-04) Xu, Huiping; Li, Xiaochun; Grannis, Shaun; Biostatistics, School of Public HealthThe widely used Fellegi-Sunter model for probabilistic record linkage does not leverage information contained in field values and consequently leads to identical classification of match status regardless of whether records agree on rare or common values. Since agreement on rare values is less likely to occur by chance than agreement on common values, records agreeing on rare values are more likely to be matches. Existing frequency-based methods typically rely on knowledge of error probabilities associated with field values and frequencies of agreed field values among matches, often derived using prior studies or training data. When such information is unavailable, applications of these methods are challenging. In this paper, we propose a simple two-step procedure for frequency-based matching using the Fellegi-Sunter framework to overcome these challenges. Matching weights are adjusted based on frequency distributions of the agreed field values among matches and non-matches, estimated by the Fellegi-Sunter model without relying on prior studies or training data. Through a real-world application and simulation, our method is found to produce comparable or better performance than the unadjusted method. Furthermore, frequency-based matching provides greater improvement in matching accuracy when using poorly discriminating fields with diminished benefit as the discriminating power of matching fields increases.Item Score Test for Assessing the Conditional Dependence in Latent Class Models and its Application to Record Linkage(Oxford, 2022-11) Xu, Huiping; Li, Xiaochun; Zhang, Zuoyi; Grannis, Shaun; Biostatistics and Health Data Science, School of MedicineThe Fellegi–Sunter model has been widely used in probabilistic record linkage despite its often invalid conditional independence assumption. Prior research has demonstrated that conditional dependence latent class models yield improved match performance when using the correct conditional dependence structure. With a misspecified conditional dependence structure, these models can yield worse performance. It is, therefore, critically important to correctly identify the conditional dependence structure. Existing methods for identifying the conditional dependence structure include the correlation residual plot, the log-odds ratio check, and the bivariate residual, all of which have been shown to perform inadequately. Bootstrap bivariate residual approach and score test have also been proposed and found to have better performance, with the score test having greater power and lower computational burden. In this paper, we extend the score-test-based approach to account for different conditional dependence structures. Through a simulation study, we develop practical recommendations on the utilisation of the score test and assess the match performance with conditional dependence identified by the proposed method. Performance of the proposed method is further evaluated using a real-world record linkage example. Findings show that the proposed method leads to improved matching accuracy relative to the Fellegi–Sunter model.