Do Not List on the Collaboration Phenotypes List
Type of Phenotype:
Friday, February 22, 2013
Owner Phenotyping Groups:
View Phenotyping Groups:
Lyam Vazquez, John Connolly. CHOP. Asthma. PheKB; 2013 Available from: https://phekb.org/phenotype/146
Abstraction/Chart Review Form?
Has any abstraction tool been developed yet?
The algorythm document mentions an attached data dictionary, but I do not see one.
I uploaded the data dictionary. However, we are still trying to map the RxNorm terminology
Great, this will allow me to
Great, this will allow me to continue along with the covariants. Thanks.
Phenotype states under Case exclusion criteria:
"Individuals with any positive confirmation of "wheezing", "asthma" or "asthma exacerbation" in the medical record as shown in table 3".
Should this be a NLP "record exclusion" not an "individual case subject exclusion", as a subject could have asthma and also have mention of family relation(s) with asthma?
Do we just want to exclude NLP records having strings as reported in table 3?
Yes, thank you for pointing that out. It should be an NLP record exclusion as you described.
Update on case exclusion
We just deleted Bronchitis from case exclusion criteria. Thanks Jim!
Update on the algorithm
Inclusion of cases by NLP only was deleted and some ICD9 codes were added for exclusion of cases and controls
Data Dictionary Questions
Just need some clarification on some of the tabs:
Atopy - what are the encoded values for Yes, No, Unknown?
2. Hx of asthma meds
Drug code - I know you mentioned this, so what is needed here? RXCUI?
Can I get a sample set of data for this tab? The reason I ask is that our data is setup as key value pairs. We have component name and value in which the IGE is considered a component and not all tests have an IGE component listed. Just want to make sure I get my output formatted correctly.
1. Atopy: Let's do Yes=1, No=0 Missing = .
2. Drug code: Yes, we will need RXCUI for the drug. We started usind Medex, the program that Josh talked about in the meeting, but we are still figuring it out.
3. Sample set of data: We can send you a sample set. Frank will work on getting the sample and sending it to you.
I will update the data dictionary with the encoded values for Atopy.
Also, I just want to clarify that in the flowchart we mention stratification by severity, but we will do that ourselves, with the data that you guys send.
Hi, we updated the labs and CPT codes tabs of the data dictionary. We added a description and a sample set of data for reporting lab results.
Thanks, I think I am good to
Thanks, I think I am good to go now!
We have validated an algorithm in adults to capture asthma exacerbation as hospitalizations/ER visits via a PGPop/PGRN project. Should we use that instead and/or in addition to the record of ICD9s? I worry that just the ICD9 may have poorer PPV.
Thank you for the suggestion. I think it would be best to capture asthma exacerbation as hospitalizations and ER visits in addition to the ICD9 codes. If you wouldn't mind sharing the algorithm, that would certainly be also really helpful to us.
Also - not everyone has drugs mapped to RxNorm codes. If you want RxNorm codes, can you provide the RxNorm codes for each med you want? That would standardize the listing of the meds in the med list. MedEx-UIMA should provide this if you want.
Do you want this as an EAV with repeating values for every record of the med in the chart? That could be very large for NLP-derived med lists (like ours). If you want multiple values, an option would be to record the first and last date mentioned.
I just added the RxNorm codes of asthma meds to the data dictionary. We got them from MedEx so thank you. For us, It would be best to have an EAV for every record of the med in chart.
In ICD9 code list, we have
In ICD9 code list, we have 518.801, 518.811 and 556 with desciption 'Idiopathic Proctocolitis'. Do you want us to search for other codes?
BMI at Diagnosis Date
For BMI at Diagnosis Date, when we run this just on diagnosis date we only get 80 out of our 326 cases with actual BMI at that date. Should we make this some window around the diagnosis date to pick up more BMIs, or is this what you are wanting? If so, what window would be appropiate? 6mo? 1 yr? etc.
another BMI question
Also, which BMI do you want for controls (as they have no diagnosis date)?
From the algorithm document, it says:
"Atopic: Individuals with ICD9 code 493.0
or two or more ICD9 codes in Table 3 on separate calendar days."
We are interpreting this as :
Individuals with ICD9 code 493.0
Individuals with ICD9 codes in Table 3
on two or more separate calendar days.
Is this correct or how else should we interpret?
sorry to us the sentence from the document is ambiguous.
If there is no BMI at time of diagnosis, please list all BMIs within a 12-month window.
Re. BMI for controls, similarly, it would be best to all values linked with age in days.
Re. atopic clarification: the interpretation is correct (Individuals with ICD9 code 493.0 OR Individuals with ICD9 codes in Table 3).
Hope this helps,
BMI cases and controls
I just want to make sure I am understanding this correctly.
For cases, you want only one value for BMI but for controls you want a repeating variable file with all possible BMI values with associated date?
all BMI easiest?
Would it just be easiest for everyone if we gave you all BMI values with age in days then you could pick whichever BMI you wanted for ea. person?
That'd be great, yes.
controls have minimum data in EMR?
For most if not all of our eMERGE phenotypes we usually have some sort of minimum data requirement to be sure we are getting true controls, i.e., absence of data does not imply absence of disease. For example, in Northwestern's asthma algorithm we required controls to have at least a Dx and/or an Rx on 2 different dates in their EMR to show that they were seen at least 2 different times by someone. Some of our controls for the current CHOP asthma algorithm have zero or only 1 in-person encounter in the EMR or no visits in the last 5 years, meaning we probably shouldn't be using them as controls as they may have asthma and we don't know it b/c we have never really seen them or only seen them once, or have not seen them recently enough.
Was there something like this in the control definition that I'm missing?
If not, we have ~600 of our ~1900 potential controls that have not been seen in the last 5 yrs or have <2 in-person visits overall, I really don't want to say those are true controls.
Excellent point and not
Excellent point and not something I had considered. I think it is too late to modify this for the other groups, but your suggestion makes total sense. Would it be too much to ask you to flag those that have <2 visits and also those that have not been seen in 5 years? We would still like the 1900, but these new variables could at the very least be covariates.
controls comparable to cases w.r.t. # visits
We provided the covariates for you, it will be interesting to find out if it makes a difference in your analysis, I suspect it might.
This is given what we have learned in eMERGE as a network, about controls being comparable to cases.
For example for this algorithm the cases are required to have visits on 2 separate days but the controls are not.
It would not be much work for other sites to send the # visits for controls as a covariate, as this is a covariate we had to extract for cases for this algorithm already anyway, plus most of the sites have plenty of controls already, and I would say that it is never too late. :)
FYI, for NU, if you remove controls w/ <2 visits we still have 1704 controls, and if you also remove controls w/o visits in last 5 yrs, there are still 1342 controls.