- Browse by Subject
Browsing by Subject "record linkage"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Comparison of Supervised Machine Learning and Probabilistic Approaches for Record Linkage(AMIA Informatics summit 2019 Conference Proceedings., 2020-03-25) McNutt, Andrew T.; Grannis, Shaun J.; Bo, Na; Xu, Huiping; Kasthurirathne, Suranga N.Record linkage is vital to prevent fragmentation of patient data. Machine learning approaches present considerable potential for record linkage. We compared the performance of three machine learning algorithms to an established probabilistic record linkage technique. Machine learning approaches exhibited results that were comparable, or statistically superior to the established probabilistic approach. It is unclear if the cost of manually reviewing datasets for supervised learning is justified by the performance improvements they yield.Item Evaluating the effect of data standardization and validation on patient matching accuracy(Oxford, 2019-05) Grannis, Shaun; Xu, Huiping; Vest, Josh; Kasthurirathne, Suranga; Bo, Na; Moscovitch, Ben; Torkzadeh, Rita; Rising, Josh; Family Medicine, School of MedicineObjective This study evaluated the degree to which recommendations for demographic data standardization improve patient matching accuracy using real-world datasets. Materials and Methods We used 4 manually reviewed datasets, containing a random selection of matches and nonmatches. Matching datasets included health information exchange (HIE) records, public health registry records, Social Security Death Master File records, and newborn screening records. Standardized fields including last name, telephone number, social security number, date of birth, and address. Matching performance was evaluated using 4 metrics: sensitivity, specificity, positive predictive value, and accuracy. Results Standardizing address was independently associated with improved matching sensitivities for both the public health and HIE datasets of approximately 0.6% and 4.5%. Overall accuracy was unchanged for both datasets due to reduced match specificity. We observed no similar impact for address standardization in the death master file dataset. Standardizing last name yielded improved matching sensitivity of 0.6% for the HIE dataset, while overall accuracy remained the same due to a decrease in match specificity. We noted no similar impact for other datasets. Standardizing other individual fields (telephone, date of birth, or social security number) showed no matching improvements. As standardizing address and last name improved matching sensitivity, we examined the combined effect of address and last name standardization, which showed that standardization improved sensitivity from 81.3% to 91.6% for the HIE dataset. Conclusions Data standardization can improve match rates, thus ensuring that patients and clinicians have better data on which to make decisions to enhance care quality and safety.Item Incorporating conditional dependence in latent class models for probabilistic record linkage: Does it matter?(ims, 2019) Xu, Huiping; Li, Xiaochun; Shen, Changyu; Hui, Siu L.; Grannis, Shaun; Family Medicine, School of MedicineThe conditional independence assumption of the Felligi and Sunter (FS) model in probabilistic record linkage is often violated when matching real-world data. Ignoring conditional dependence has been shown to seriously bias parameter estimates. However, in record linkage, the ultimate goal is to inform the match status of record pairs and therefore, record linkage algorithms should be evaluated in terms of matching accuracy. In the literature, more flexible models have been proposed to relax the conditional independence assumption, but few studies have assessed whether such accommodations improve matching accuracy. In this paper, we show that incorporating the conditional dependence appropriately yields comparable or improved matching accuracy than the FS model using three real-world data linkage examples. Through a simulation study, we further investigate when conditional dependence models provide improved matching accuracy. Our study shows that the FS model is generally robust to the conditional independence assumption and provides comparable matching accuracy as the more complex conditional dependence models. However, when the match prevalence approaches 0% or 100% and conditional dependence exists in the dominating class, it is necessary to address conditional dependence as the FS model produces suboptimal matching accuracy. The need to address conditional dependence becomes less important when highly discriminating fields are used. Our simulation study also shows that conditional dependence models with misspecified dependence structure could produce less accurate record matching than the FS model and therefore we caution against the blind use of conditional dependence models.