Electronic Health Record-based Phenotyping Algorithm for Familial Hypercholesterolemia

Familial hypercholesterolemia (FH) is a relatively common Mendelian genetic disorder that is associated with elevated plasma low-density lipoprotein cholesterol (LDL-C) levels and dramatically increased lifetime risk for premature atherosclerotic cardiovascular disease (ASCVD). FH can be diagnosed based on clinical presentation and/or genetic testing results, with a positive genetic testing considered to be the “gold standard”. Clinical diagnosis is based on a set of clinical criteria including lipid panel testing, personal and family history of hypercholesterolemia or premature ASCVD, presence of xanthomas on extensor tendons or thickening of the Achilles tendon, and early corneal arcus. We provide a pseudocode to identify cases and controls for primary hypercholesterolemia followed by FH. Structured data are processed using preset codes and unstructured data are processed using natural language processing (NLP). Final output consists of (i) a case/control/unknown status for primary hypercholesterolemia, (ii) demographics of each individual (age at the time of qualifying LDL-C ascertainment, gender, race/ethnicity), (iii) lipid profile (total cholesterol, LDL-C, HDL-C, triglycerides), (iv) lipid-lowering treatment and difference in time between the index date and date of treatment ascertainment, (v) personal history of premature ASCVD and/or hypercholesterolemia, (vi) family history of premature ASCVD, (vii) xanthomas and/or early corneal arcus, (viii) Dutch Lipid Clinic Network score and case/control/unknown for FH status.

Unstructured Data:

NLP

Structured data:

CPT

ICD9

Data Source/clinical domain:

observations

medications

Files:

Flowchart_Electronic Health Record-based Phenotyping Algorithm for Familial Hypercholesterolemia

Appendix1_NLP_System

Appendix_2_Testing_Set

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Content_Description

Primary_Validation_2017

Supplementary material_ICD10Codes

Pseudocode_Electronic Health Record-based Phenotyping Algorithm for Familial Hypercholesterolemia_Full

Information

Phenotype ID:

602

Date Created:

Thursday, November 10, 2016

Status:

Final

List on the Collaboration Phenotypes List

Contact information

Contact Author:

Maya Safarova

Authors:

Safarova MS, Liu H, Arruda-Olson A, Rastegar M, Smith C, Cheng Y, Fan X, Balachandran P, Sohn S, Kullo IJ

Institution:

Mayo Clinic

Network Associations:

eMERGE

View Phenotyping Groups:

eMERGE Geisinger Group

eMERGE Phenotype WG

Owner Phenotyping Groups:

eMERGE Mayo Group

Demographics

Gender:

Female

Male

Age:

Adult

legacy

Ethnicity:

Hispanic

Non-hispanic

Phenotype Attributes:

Natural Language Processing

Type of Phenotype:

Disease or Syndrome

Suggested Citation

Safarova MS, Liu H, Arruda-Olson A, Rastegar M, Smith C, Cheng Y, Fan X, Balachandran P, Sohn S, Kullo IJ. Mayo Clinic. Electronic Health Record-based Phenotyping Algorithm for Familial Hypercholesterolemia. PheKB; 2016 Available from: https://phekb.org/phenotype/602

PubMed References

27678441

Comments

Implementation questions

At Partners HealthCare, our clinical notes are not stuctured in a standard way, and it is challenging to use NLP to determine family history. We would like to know more about your NLP methods.

1. Does the program require that notes be in a specific format? Do the notes have to include a family history section?

2. Are the notes coming from a particular software program?

3. Did the validation site have the same formatting in their notes?

4. How does the Java program deal with unstructured notes in other formats? If so, has this been tested and does it work?

Thanks, Beth Karlson

Implementation

Dear Beth, Thank you for this feedback. Please see below our comments. Maya.

1. Does the program require that notes be in a specific format? Do the notes have to include a family history section?
- NLP is run using MedTagger. Per “FH_eAlgorithm_Pseudocode_FullText_2016”: A link to installation and user guides could be found here:
http://ohnlp.org/index.php/MedTagger_Project_Page
There is no specific requirement pertinent to the patient notes (free text).
In the primary site, at Mayo Clinic, we used solely “Family History” section of clinical notes.
Please see “VALIDATION OF THE FH eALGORITHM IN THE GEISINGER HEALTH SYSTEM_2017” regarding the feedback from the validation site: “In selected cases based on the adopted strategy to record encounters in the index implementation center, search space for the family history of early-onset ASCVD could be expanded to the “Personal|Past Medical History”.”

2. Are the notes coming from a particular software program?
Given the diversity of medical language, NLP system is advised to be modified based on the adopted strategy to record encounters in the implementation site.
Regardless of the EHR vendor, free text within generated clinical notes is amenable to MedTagger.

3. Did the validation site have the same formatting in their notes?
Since primary and validation sites used different EHR vendors, there were differences in the baseline formatting of the clinical notes.

4. How does the Java program deal with unstructured notes in other formats? If so, has this been tested and does it work?
Given that the input is text per se from the sections relating individual FHx, there should not be any issue regarding formatting. List of references that may be helpful could be found here: http://ohnlp.org/index.php/OHNLP_Publications

no fasting glucose

for those sites like us that are not able to distinquish fasting vs not for glucose tests, can we just use regular glucose test as exclusion (table 2A), & if so, what would the cutoff be (still >220 mg/dl?)

Fasting glucose

Hi Jen, Could you please share codes that you use to extract glucose levels. Please feel free to email me @ Safarova.Mayya@mayo.edu. Thanks.

NLP of family history of premature CHD, CVD, PAD

If during NLP we find genderless terms such as child, children, kids, sibling, etc... what age do we use for determining whether they have premature condition Age <=65 OR Age <=55?

NLP for family history of premature ASCVD

Hi Jim, we would suggest using a threshold of =<60 years. Could you please share specific examples from your sample. Thanks.

NLP examples including genderless terms

Here are some examples including genderless terms such as (child, kid, sibling, etc...)

For family history of premature ASCVD: Patient has three siblings, one of whom had a stroke at age 56.

For family history of premature CHD: The patient has two siblings total one died at age 49 of heart disease and another is living at age 70.

Re: NLP examples including genderless terms

Hi Jim, Thank you for sharing these examples. We would suggest using a threshold of 60 y.o. for these genderless cases. Will updated the pseudocode accordingly. Thanks. Maya

Notes do not have a family history section

Maya, Thanks for your response to my initial question. In our EHR, we do not have a separate family history section. Does your program work if there is no clear designation of a section? Have any of the other sites implemented your program in unstructured notes? Thanks, Beth

Re: Notes do not have a family history section

Hi Beth, In EHRs with no designated family history section (structured or unstructured), this system will scan the whole text. In this case scenario, we anticipate increased probability of false positive results.
However, there was no feedback to share yet from the sites with EHRs without note sections/types. We look forward to learning from your experience. P.S. One thought to consider could be demarcating a search space for, e.g. keywords, negation --> age brackets and relatedness, within 2-3 sentences. Thank you! Maya

NLP program - MedTagger

Does MedTagger only run against files? or can it be run against a database as well? In reading the User Guide, it looks as though it only runs against either a single file or multiple files. Is that correct?

Thanks, Barbara

RE: NLP program - MedTagger

Hi Beth, True. Input for MedTagger is text file(s).

Demographics_Data_Dictionary

Please note that dd for demographics has been updated. 'Index date' was removed. Two additional variables pertinent to LDL and DLCN score were added. Please let me know if any questions. Thanks.

NLP_Feedback 1

Execution steps:
(i). download the version 1.0.1 on https://sourceforge.net/projects/ohnlp/files/MedTagger/
(ii). follow the installation guide available http://ohnlp.org/index.php/MedTagger_Install_Guide
MedTagger is an open source UIMA (Unstructured Information Management Architecture)-based NLP tool. Hence, it does require UIMA experience.
RunMedTaggerCVD.bat file: java -Xms512M -Xmx1024M -cp resources;desc;descsrc;MedTagger-1.0.1.jar org.apache.uima.tools.cvd.CVD

F.A.Q.:
1. Input:
- NLP part of the eAlgorithm can use any clinical notes available in EHR. In Mayo site, we used predominantly a “family history” section of the EHR along with PPI (patient provided information which is a structured source). We started with scanning all note types and sections. However, in this quasi-experiment we found that focusing on the FHx section only is sufficient. This way we improved the accuracy of case detection, reducing the noise, and optimizing the time spent. Certainly, there are differences in local practices of recording patient notes, tailoring the search space on a site-level.

2. Family history
- For MedTagger, the input can be one file encompassing all notes as well as multiple files per targeted sections. The note does not have to be structured. Sectionizing is not obligatory. However, access to the section numbers/IDs improves accuracy. Notes should be decrypted. Concept detection with MedTagger occurs through converting free text terms to normalized ones. It allows indexing based on dictionaries. Absence of a family history section per se does not preclude pertinent family member data extraction.

3. NLP for personal history or physical examination.
Please check this file https://phekb.org/sites/phenotype/files/FH_eAlgorithm_Pseudocode_FullTex... --> Figure 1: PE: Algorithm 5&6; PHx of ASCVD Algos 3&4 (pp. 21-26). A different visual depiction could be found here: https://phekb.org/sites/phenotype/files/Appendix_1_0.pdf

4. MedTagger and SQL Server
MedTagger requires physical files.

Lipid-lowering Medications

The timeframe for the meds search is 1 year prior to the index date (date of the qualifying LDL), excluding six weeks immediately before the index date. If there is an order/prescription within this time frame --> label a person as ON LLT --> make an LDL correction assuming a 30% reduction in LDL on a statin ("Recalc LDL" variable). If no statin or any other lipid-lowering drug from the list is identified, use the LDL level as-is (still record as "Recalc LDL" variable per demographics dd). To report the meds in the dd pull the closest to the index date and give a preference to the drugs from the statin class.

Recalc_LDL

Recalc LDL cannot contain 0 values. Please include uncorrected LDL levels if not on LLT within a prespecified time interval.Thanks!

Structured data only version of FH phenotype algorithm?

Is there a useful implementation of this algorithm that does not require NLP and relies exclusively on structured data? We ask because NLP development work for this phenotype at KPW would be prohibitively time consuming. This is because 1) KPW notes lack regular section headings or cues that would facilitate identifying family history documentation, and 2) the FH NLP system provided only works with clinical notes stored as individual text files, which adds additional work for sites like KPW that store clinical notes in a relational database. (We note that other phenotype-specific NLP systems for some prior eMERGE phenotypes have accommodated input from data bases as well as individual files.)

Best,

David

Hi David,

Sure, we totally understand the difficulty of runing NLP. You can disregard the NLP component without hurting the algorithm much.

Basically, the FH algorithm calculates a DLCN score which is composed of LDL score, personal history score, family history score and physical examination score. NLP was involved in searching personal history and family history. If you cannot run NLP, personal history can still be checked using ICD codes. We will miss the points from family history. That's what it is.

Please feel free to post any other questions. Thank you for implementing the algorithm.
Xiao

We understand, as you say,

We understand, as you say, that "personal history can still be checked using ICD codes". Are there also ICD codes for FAMILY history? If so, please post. Also, please clarify which ICD code you want us to include for "personal history."

Thank you,

David

ICD codes

Hi David,

1. Please see “Input to the eAlgorithm for familial hypercholesterolemia” in the Flowchart_Electronic Health Record-based Phenotyping Algorithm for Familial Hypercholesterolemia.

To identify personal history of CHD and/or CVD / PAD please start with a set of ICD codes available in Table 4 in https://phekb.org/sites/phenotype/files/FH_eAlgorithm_Pseudocode_FullTex...
and Table 4 in https://phekb.org/sites/phenotype/files/Map_ICD9_2_ICD10_CS_MSS_03222017...

General remarks: Premature ASCVD case status is defined with the presence of two or more pertinent diagnosis and/or procedural codes in EHR before age 56 in men and 66 in women. Two or more corresponding codes should be found during the same time frame and before the gender-specific age cut-offs, with at least 5 or more days separating the two codes. Assigned codes should be evaluated at discharge from each encounter during the surveillance period.

To increase SN and SP of the case-control status identification, please refer to Algorithms 3 and 4 for the NLP logic in https://phekb.org/sites/phenotype/files/FH_eAlgorithm_Pseudocode_FullTex...

Should any challenges occur with NLP implementation at your site, please proceed with code logic.

2. To identify family history of ASCVD (=CHD and/or CVD / PAD) or FHx of hypercholesterolemia we utilized NLP. With less accuracy ICD codes could be also utilized. Please see page 26 in https://phekb.org/sites/phenotype/files/Map_ICD9_2_ICD10_CS_MSS_03222017...

Family history: Following ICD codes will return 1 or 0 for a FHx component:
ICD9 code V17.3 for Family history of ischemic heart disease.
ICD10 code Z82.49 for Family history of ischemic heart disease and other diseases of the circulatory system

Thank you.

Maya

Family history of hyper-C per "PPI (Table 5D)"

Please explain whether the information provided in the pseudo code document under "PPI (Table 5D)" is unique to Mayo Clinic. It it is, please explain how you would like us to impliment something similar relevant to our setting (assuming a simple translation is feasible).

Thanks,

David

Patient provided information

PPI, as another section available in our EHR platform (GE system at the time of the development and implementation of the eAlgo), was utilized as a structured data source. We found that since FHx is not recorded systematically by the health care providers, PPI could be leveraged for this purpose.
Please specify which EHR system are you currently using? Is there any section within your system that is filled out by the patient and contains any relevant to the FHx information? If you scan these data into the chart, is it being transcribed?

Thanks, Maya

Search form

Electronic Health Record-based Phenotyping Algorithm for Familial Hypercholesterolemia

Primary tabs

Suggested Citation

PubMed References

Comments