Phenotype ID: 
Do Not List on the Collaboration Phenotypes List
Type of Phenotype: 
Lyam Vazquez, John Connolly
Contact Author: 

Suggested Citation

Lyam Vazquez, John Connolly. CHOP. Asthma. PheKB; 2013 Available from:


The algorythm document mentions an attached data dictionary, but I do not see one.


I uploaded the data dictionary.  However, we are still trying to map the RxNorm terminology


Phenotype states under Case exclusion criteria:

"Individuals with any positive confirmation of "wheezing", "asthma" or "asthma exacerbation" in the medical record as shown in table 3". 

Should this be a NLP "record exclusion" not an "individual case subject exclusion", as a subject could have asthma and also have mention of family relation(s) with asthma?

Do we just want to exclude NLP records having strings as reported in table 3?



Yes, thank you for pointing that out.   It should be an NLP record exclusion as you described.

 Inclusion of cases by NLP only was deleted and some ICD9 codes were added for exclusion of cases and controls


Just need some clarification on some of the tabs:

1. Demographics

Atopy - what are the encoded values for Yes, No, Unknown?

2. Hx of asthma meds

Drug code - I know you mentioned this, so what is needed here? RXCUI?

3. Labs 

Can I get a sample set of data for this tab? The reason I ask is that our data is setup as key value pairs. We have component name and value in which the IGE is considered a component and not all tests have an IGE component listed. Just want to make sure I get my output formatted correctly.


1. Atopy: Let's do Yes=1, No=0 Missing = .

2. Drug code: Yes, we will need RXCUI for the drug.  We started usind Medex, the program that Josh talked about in the meeting, but we are still figuring it out.

3. Sample set of data: We can send you a sample set.  Frank will work on getting the sample and sending it to you.


I will update the data dictionary with the encoded values for Atopy.

Also, I just want to clarify that in the flowchart we mention stratification by severity, but we will do that ourselves, with the data that you guys send. 


Hi, we updated the labs and CPT codes tabs of the data dictionary.  We added a description and a sample set of data for reporting lab results.

Submitted by Josh Denny on

We have validated an algorithm in adults to capture asthma exacerbation as hospitalizations/ER visits via a PGPop/PGRN project.  Should we use that instead and/or in addition to the record of ICD9s?  I worry that just the ICD9 may have poorer PPV.

Hi Josh,

Thank you for the suggestion. I think it would be best to capture asthma exacerbation as hospitalizations and ER visits in addition to the ICD9 codes. If you wouldn't mind sharing the algorithm, that would certainly be also really helpful to us. 

Submitted by Josh Denny on

Also - not everyone has drugs mapped to RxNorm codes.  If you want RxNorm codes, can you provide the RxNorm codes for each med you want?  That would standardize the listing of the meds in the med list.  MedEx-UIMA should provide this if you want.


Do you want this as an EAV with repeating values for every record of the med in the chart?  That could be very large for NLP-derived med lists (like ours).  If you want multiple values, an option would be to record the first and last date mentioned.

I just added the RxNorm codes of asthma meds to the data dictionary.  We got them from MedEx so thank you. For us, It would be best to have an EAV for every record of the med in chart.

In ICD9 code list, we have 518.801, 518.811 and 556 with desciption 'Idiopathic Proctocolitis'.  Do you want us to search for other codes?

For BMI at Diagnosis Date, when we run this just on diagnosis date we only get 80 out of our 326 cases with actual BMI at  that date.  Should we make this some window around the diagnosis date to pick up more BMIs, or is this what you are wanting?  If so, what window would be appropiate?  6mo? 1 yr? etc.




From the algorithm document, it says:

"Atopic: Individuals with ICD9 code 493.0

or two or more ICD9 codes in Table 3 on separate calendar days."

We are interpreting this as :


Individuals with ICD9 code 493.0 


Individuals with ICD9 codes in Table 3

on two or more separate calendar days.


Is this correct or how else should we interpret?

sorry to us the sentence from the document is ambiguous.



If there is no BMI at time of diagnosis, please list all BMIs within a 12-month window.

Re. BMI for controls, similarly, it would be best to all values linked with age in days.

Re. atopic clarification: the interpretation is correct (Individuals with ICD9 code 493.0 OR Individuals with ICD9 codes in Table 3).

Hope this helps,


I just want to make sure I am understanding this correctly.

For cases, you want only one value for BMI but for controls you want a repeating variable file with all possible BMI values with associated date?

Would it just be easiest for everyone if we gave you all BMI values with age in days then you could pick whichever BMI you wanted for ea. person?  

For most if not all of our eMERGE phenotypes we usually have some sort of minimum data requirement to be sure we are getting true controls, i.e., absence of data does not imply absence of disease. For example, in Northwestern's asthma algorithm we required controls to have at least a Dx and/or an Rx on 2 different dates in their EMR to show that they were seen at least 2 different times by someone.  Some of our controls for the current CHOP asthma algorithm have zero or only 1 in-person encounter in the EMR or no visits in the last 5 years, meaning we probably shouldn't be using them as controls as they may have asthma and we don't know it b/c we have never really seen them or only seen them once, or have not seen them recently enough.

Was there something like this in the control definition that I'm missing?

If not, we have ~600 of our ~1900 potential controls that have not been seen in the last 5 yrs or have <2 in-person visits overall, I really don't want to say those are true controls.



Excellent point and not something I had considered. I think it is too late to modify this for the other groups, but your suggestion makes total sense. Would it be too much to ask you to flag those that have <2 visits and also those that have not been seen in 5 years? We would still like the 1900, but these new variables could at the very least be covariates.


We provided the covariates for you, it will be interesting to find out if it makes a difference in your analysis, I suspect it might.  

This is given what we have learned in eMERGE as a network, about controls being comparable to cases.  

For example for this algorithm the cases are required to have visits on 2 separate days but the controls are not.  

It would not be much work for other sites to send the # visits for controls as a covariate, as this is a covariate we had to extract for cases for this algorithm already anyway, plus most of the sites have plenty of controls already, and I would say that it is never too late.  :)

FYI, for NU, if you remove controls w/ <2 visits we still have 1704 controls, and if you also remove controls w/o visits in last 5 yrs, there are still 1342 controls.