This document describes the Stanford University algorithm to extract individuals with diabetes and the type of diabetes from electronic health records (EHRs). There are two main tasks of this phenotype development: 1) to extract patients with diabetes (gestational diabetes is excluded), and 2) to discriminate between type 1 diabetes mellitus (T1DM) and type 2 diabetes mellitus (T2DM). Instead of identifying all diabetes cases, we aim to reduce the number of false positives in our diabetes cohort. The prior is crucial for public health surveillance, yet we aim to achieve the later for clinical research use.
Algorithm Description
Individuals with diabetes were identified by having diagnosis codes combined with either having abnormal laboratory results or being prescribed with diabetes-related medications. We then modified the Klompas (2013) algorithm1 by including additional ICD-10 diagnosis codes to classify T1DM vs. T2DM. Structured data required from EHRs include:
• Diagnosis code (ICD-9 and ICD-10)
• Prescribed medication (RxNorm)
• Laboratory test results (LOINC)
Algorithm Performance
T1DM identification
- Precision = 0.913
- Recall = 0.987
- F1 score = 0.948
- Accuracy = 0.964
T2DM Identification
- Precision = 0.979
- Recall = 0.958
- F1 score = 0.968
- Accuracy = 0.959