The algorithm uses Structured Query Language to identify AAA cases, controls, and excludes from the Electronic Medical Record. AAA cases were defined as meeting at least one of three criteria: had a AAA repair procedure (Case Type 1), had at least one vascular clinic encounter with a diagnosis of ruptured AAA (Case Type 2), or had at least two vascular clinic encounters with a diagnosis of unruptured AAA (Case Type 3).
Breast cancer is the most common cancer and the second leading cause of cancer-related death among women in the U.S. Known breast cancer risk factors include age, race/ethnicity, reproductive factors, and benign breast disease. Family history of breast cancer and hereditary cancer syndromes, such as BRCA1/BRCA2 mutations, confer the strongest risk for this disease.
Carotid artert atherosclerosis disease (CAAD) is measured in cases and controls by both structured data, including ICD diagnosis codes, and quantitative measurements of carotid stenosis based on doppler and other imaging technologies.
The phenotype algorithm includes typical eMERGE pseudo code for implementing the structured data components of the algorithm, as well as a portable natural language processing (NLP) system used to extract percent stenosis measurements from imaging reports.
Algorithm to select subjects with "normal" electrocardiograms. Subjects do not have heart disease, interfering medications, or abnormal electrolytes at the time of the normal ECG. Individuals may, however, develop abnormalities later in life.
Hypothetical timeline for a single patient:
Clostridium difficile, also known as "C. diff," is a species of bacteria that causes severe diarrhea and other intestinal disease when competing bacteria in the gut have been wiped out by antibiotics (see Wikipedia entry). In rare cases a C. diff infection can progress to toxic megacolon which can be life-threatening. In a very small percentage of the adult population C. difficile bacteria naturally reside in the gut. Other people accidentally ingest spores of the bacteria while patients in a hospital or nursing home.
A pheontype defining patients with strong evidence of having been diagnosed with colorectal cancer (cases) and patients who clearly do not have such diagnoses (controls). This phenotype is being used for sequencing studies. The only NLP involved in this phenotype is a very simple string search applied to pathology reports.
Depression accounts for substantial morbidity and mortality worldwide and risk of experiencing it may have a genetic component. Depressive disorders manifest along a gradient from mild to severe. Electronic health record (EHR) data linked to large, multi-site biobanks facilitate exploration of the genetic component of depression.
An algorithm for finding patients with diverticulosis, and of those, patients who also have diverticulitis, and to also find control patients. Control patients will have had a colonoscopy but have no evidence of diverticula.
Simple NLP (a portable program is posted here, with instructions, and support is availabe from NU as needed) of colonoscopy reports is the gold standard algorithm, but if the text of colonoscopy reports is not available, an alternate algorithm using CPT & ICD-9 codes can be used, which is also posted.