Incorporating conditional dependence in latent class models for probabilistic record linkage: Does it matter?

dc.contributor.authorXu, Huiping
dc.contributor.authorLi, Xiaochun
dc.contributor.authorShen, Changyu
dc.contributor.authorHui, Siu L.
dc.contributor.authorGrannis, Shaun
dc.contributor.departmentFamily Medicine, School of Medicineen_US
dc.date.accessioned2020-11-06T20:53:44Z
dc.date.available2020-11-06T20:53:44Z
dc.date.issued2019
dc.description.abstractThe conditional independence assumption of the Felligi and Sunter (FS) model in probabilistic record linkage is often violated when matching real-world data. Ignoring conditional dependence has been shown to seriously bias parameter estimates. However, in record linkage, the ultimate goal is to inform the match status of record pairs and therefore, record linkage algorithms should be evaluated in terms of matching accuracy. In the literature, more flexible models have been proposed to relax the conditional independence assumption, but few studies have assessed whether such accommodations improve matching accuracy. In this paper, we show that incorporating the conditional dependence appropriately yields comparable or improved matching accuracy than the FS model using three real-world data linkage examples. Through a simulation study, we further investigate when conditional dependence models provide improved matching accuracy. Our study shows that the FS model is generally robust to the conditional independence assumption and provides comparable matching accuracy as the more complex conditional dependence models. However, when the match prevalence approaches 0% or 100% and conditional dependence exists in the dominating class, it is necessary to address conditional dependence as the FS model produces suboptimal matching accuracy. The need to address conditional dependence becomes less important when highly discriminating fields are used. Our simulation study also shows that conditional dependence models with misspecified dependence structure could produce less accurate record matching than the FS model and therefore we caution against the blind use of conditional dependence models.en_US
dc.eprint.versionFinal published versionen_US
dc.identifier.citationXu, H., Li, X., Shen, C., Hui, S. L., & Grannis, S. (2019). Incorporating conditional dependence in latent class models for probabilistic record linkage: Does it matter? Annals of Applied Statistics, 13(3), 1753–1790. https://doi.org/10.1214/19-AOAS1256en_US
dc.identifier.urihttps://hdl.handle.net/1805/24300
dc.language.isoenen_US
dc.publisherimsen_US
dc.relation.isversionof10.1214/19-AOAS1256en_US
dc.relation.journalAnnals of Applied Statisticsen_US
dc.rightsPublisher Policyen_US
dc.sourcePublisheren_US
dc.subjectconditional dependenceen_US
dc.subjectrecord linkageen_US
dc.subjectmatching accuracyen_US
dc.titleIncorporating conditional dependence in latent class models for probabilistic record linkage: Does it matter?en_US
dc.typeArticleen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Xu_2019_incorporating.pdf
Size:
446.86 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: