Equivalence of Type 2 Diabetes Prevalence Estimates: Comparative Study of Similar Phenotyping Algorithms Using Electronic Health Record Data
Date
Language
Embargo Lift Date
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Abstract
Background: Timely surveillance of diabetes mellitus remains a challenge for public health agencies. In this study, researchers compared type 2 diabetes (T2D) prevalence estimates using electronic health record (EHR) data and computable phenotypes (CPs) as defined and applied by 2 independent networks. One network, Diabetes in Children, Adolescents, and Young Adults, was a research consortium, and the other, the Multi-State EHR-Based Network for Disease Surveillance, is a practice-based public health surveillance network.
Objective: This study sought to determine the equivalence of T2D prevalence estimates generated by 2 distinct, yet conceptually related, CPs using EHR data.
Methods: Each network used diagnostic, laboratory, and medication data for young adults (aged 18-44 years) extracted from the Indiana Network for Patient Care (INPC) to independently calculate prevalence of T2D using distinct CPs for the year 2022. The INPC is a statewide health information exchange that receives EHR data from multiple health care systems and supports public health use cases such as surveillance. The two one-sided tests method for independence with a predefined margin of -2.5 to +2.5 percentage points was used to compare the estimated prevalence as previously derived from the Multi-State EHR-Based Network for Disease Surveillance and Diabetes in Children, Adolescents, and Young Adults networks. The two one-sided tests for equivalence show that any observed difference between 2 estimates is small and practically insignificant. Results at the overall level, and stratified by sex, age, and race or ethnicity, were examined.
Results: Overall prevalence estimates for 2022 were 4.1% for CP 1 and 2.4% for CP 2. Although prevalence estimates for CP 1 were consistently higher than those for CP 2, absolute differences were generally less than 2.5 percentage points, which did not result in a statistically significant (P<.001) difference between estimates. The only exception was for Hispanic individuals, where prevalence was significantly different (P=0.2) for CP 1 (5.4%) versus CP 2 (3.0%), yielding a margin of 2.4 (95% CI 2.2-2.6) percentage points. Other groups that had relatively higher but statistically nonsignificant prevalence included male individuals (4.6% for CP 1 vs 2.3% for CP 2), individuals aged 35-44 years (6.9% for CP 1 vs 4.9% for CP 2), and African American individuals (5.5% for CP 1 vs 3.7% for CP 2). Therefore, we concluded that the 2 CPs largely produced equivalent estimates of T2D prevalence.
Conclusions: The 2 independent CPs demonstrated equivalent T2D prevalence estimates, except in Hispanic individuals. Although the CPs can be considered statistically equivalent, the data driving each CP may impact accuracy and completeness. CP 1 was broader, incorporating clinical diagnoses, laboratory data, and medication, whereas CP 2 used clinical diagnostic codes alone. These results have implications for improving harmonization of CPs for public health surveillance.
