Pneumonia- VUMC eMERGE v5.1

Identify bacterial pneumonia, similar to that reported with genetic association risk in CD143 and TLR4 A229G in literature.

Phenotype ID: 
Do Not List on the Collaboration Phenotypes List
Type of Phenotype: 
Andrea Ramirez, MD
Contact Author: 

Suggested Citation

Andrea Ramirez, MD. Vanderbilt University Medical Center. Pneumonia- VUMC eMERGE v5.1. PheKB; 2018 Available from:


Submitted by Cong Liu on

Hi, the table 3 (exclusion criteria is empty). Could you update it? Thanks


Submitted by Cong Liu on

Hi, in CASES 4. "two of same code or two from same bin". Does that mean two code from the same criteria. For example, two code from Asthma?

Yes, that means either two of the same code i.e , two of "493.2" (Chronic obstructive asthma) - or one "493.2" and one of "493.9" (Asthma, unspecified type, unspecified), which both belong to the same criteria.

Submitted by Cong Liu on

Hi, there is no data dictionary for this implementation. Could you provide one? Thanks. - Cong

We uploaded 4 Data dictionaries, please let us know if you have any questions

Submitted by Cong Liu on

We, Columbia, are the secondary site for this phenotype. Could you please provide a review instruction for this phenotype? Thank you.


Submitted by Cong Liu on

Hi there,

I have following dd related questions.

1. The definition of case and control is not clear in algorithm_v2. Is case defined as the subjects meet all criteria of 1~4?

2. Please specify "Pneumonia Positive" through radiology reports and "Pneumonia Positive" through radiology reportsICDs?

3. There are three columns in dd1 named as "PNA_EVT_N". Please change the names accordingly.

4. What is "N_EXCL_CASES" and "N_EXCL_CTRLS" in dd4? 

Thank you.


Hi Cong- please see the new algorithm with review instructions that may make the data dictionary more clear.  Please let m eknow if further questions.  Many thanks, Andrea

Hi Cong- apologies no it is v3- please find it there now- I put on the wrong site- Thanks, Andrea

Actually the file is named v4 because we added the review instructions on the end- it is same as v3 algorithm otherwise and titled v3 inside doc.  Should correspond to data dictionaries.  Thanks

My input on the algorithm seems that a couple steps could be reversed: Step 2 (inclusion based on codes) should be primary, THEN step 1 identify radiology reports and look for keyword.
At least here at CCHMC, it's necessary because I don't already have the linked Radiology reports. So reversing the steps allows for a reduced set of extracted reports.

Thank you for your input, not sure if reversing it would yield exactly the same results, since at least here at Vanderbilt not all subjects that have rad reports have also ICDs and vice versa, but if you don't have the RAD reports that could be another way to do it. You could redefine the window based on ICDs then look for Radiology reports around it then Antibiotics.


I was wondering if you could clarify the first step in control ascertainment per the algorithm;

"Include any subjects to those who meet the medical home definition 3 or more primary care visits in 2 years (from published eMERGE BPH algorithm)."

Which segment of BPH algorithm do we need to implement to filter patients by "3 or more primary care visits in 2 years" ? Appreciate your help. Thanks.


-Mayo Clinic



Hi Ozan,

The BPH algorithm mentions it but not in great detail, the implementation code will depend on the site, their database schema and what they consider as primary care or primary care like departments (internal medicine etc,). We have here a list of 160 departments that fit that description. Some sites have that definition already implemented so you may want to look if there is an existing view or table for medical home subjects. Otherwise you can implement it by looking for subjects that have at least 3 encounters within 2 years in your set of primary care or primary care like departments.

You can write code that while restricting to primary care like departments, looks 2 encounters away and asks if that is less than two years away. If any are yes, you count the subject, otherwise you do not. Please let me know if you have any more questions about it.



Hello Aziz, 

To clarify further, looking at previous eMERGE algorithms that used this medical home definition, it was defined as "... primary care within your study site defined as >= 2 visits with a primary care provider over a minimum of a 3 year[s] ...," which makes more sense to us as a healthy control may only see their doctor once/year for a annual physical exam, and sometimes may be a little longer than 1 year between such exams.  Does that match what you are trying to achieve? 

Thank you, Jen & Anika 

PNA_Data_Dictionary_3.csv and PNA_data_dictionary_4.csv both incorporate numeric field names (varname).  As far as I know, SAS does not allow for varnames that start with a number or contain a decimal point. Is it possible to rename your field names? Thanks! 

Hi Jim,

Thanks for your feedback, I believe you meant Data Dictionaries 2 and 3 , the ones that contain icd codes and abx cuis. We didn't implement in SAS so we didn't face that issue but yes it would be possible to add a char prefix before those field names if it's causing a problem. However, I'm not sure how many other sites use SAS and some sites already completed the implementation and others already started, so changing the field names for everybody might cause an inconvenience to others at this point. Would it be possible for you to populate the Data dictionaries and then rename before submitting to PheKB ?

Could a case qualify by having exactly 1 diagnosis of 481 code and 1 diagnosis of 486 code within the 31 day window, or does a case have to have two or more of the same diagnosis code within the 31 day window to qualify as a case?

Your inclusion and exclusion diagnosis csv files (Pneumonia_inclusions_list40.csv and Pneumonia_exclusions_list40.csv) have trimmed off any leading and trailing zeroes on the diagnosis codes (i.e. exclusion diagnosis code 010.00 is listed as 10 in Pneumonia_exclusions_list40.csv file). Is it possible to fix these files and repost them to PheKB? Thanks!

Hi Jim,

I don't recall the trimmed leading and trailing when I generated the file, they may have been inadvertantly autosaved when someone opened the files in Excel. It should be good now. I also added a code_type field to distinguish between ICD9 and ICD10 codes to facilitate the implementation. Thanks !


We don't have medication CUIs in our database. Is there another file of medication names we can use, or can we search for the drug names to find the medication?



In PNA_Data_dictionary_1 do you want us to include "all" PNA events for the cases? What if a case has multiple events and one PNA event qualifies and the other event does not? Do you want us to include the PNA event that did not qualify? If so, what do we list the  case_control status as for the excluded event?

Yes, one case can have multiple events- if one qualifies, keep that and detail, if another event doesn't qualify, drop that event.  that subject would be a case with qualifying events listed, and not a control.  Thanks and let me know if you have any further questions.

Do you have a list of possible radiology report names that would consist of a possible PNA event? Would CTs, MRIs and XRAYs all qualifty?

Yes, all of those radiology reports may have qualifying pneumonia events- It's ok to search other radiology reports as they're unlikely to contain the work pneumonia if you're searching the detail and impression of the note, not the indication.  Sorry every institution will have different names for individual reports.  Please let me know if you have any further questions.  Best wishes, Andrea Ramirez

Do you have specific ICD codes or CPT codes you want us to use for pregnancy or a time frame based on the codes to take into account 9 months a women is pregant for? Metformin response had used certain CPT and ICD codes and had a 10 month window if it's okay to use the same code for pregancy that they did?


Hi Anika- the pneumonia exlucsions list for has the pregnancy codes as part of exclusion group A in the list- it is a +-1 year window.  Please let me know if you have any further questions.  Thanks, Andrea


I have a question about antibiotics and Data Dictionary 3 for the pneumonia algorithm. In the file “PNAAbxCuis.xlsx”, there are 5 different listings for amoxicillin (codes 8698, 723, 1753, 19791, 48203). But in the Data Dictionary, the number of antibiotic mentions for each code is requested. So if we find someone had amoxicillin,  which code should we apply for that instance? I gave Amoxicillin as an example but it's not the only medication that is mapped to multiple CUIs. Thank you for your help.



thanks Ozan, the individual CUIs from column B can be mapped back to A and collapsed in the count- the count is days with any mention of that column A drug name.  please let me know if you have any other questions, thanks


In our database we don't have the medications Vanceril and Vancenase listed as antibiotics. I wanted to confirm that these medications should be listed as antibiotics for the pneumonia algorithm?


thanks anika good catch- those shouldn't be there they are inhaled steroids- please exclude since you're still implementing but i won't re-release because they should be very low frequency and we can use the data dictionary on the back end to count and drop any cases that qualified only this way.  thanks again

Hello, it's not particularly clear in the algorithm how PNA events other than the first should be defined. Should we be looking for the next non-negated mention anytime after the 5 months following the first non-negated mention (PNA Event 1)? And then create the 6-month window around that second non-negated mention such that the 1 month prior (baseline) might overlap with the 5 month post (follow-up) of the preceding event? Please confirm or clarify if otherwise. Thanks.

Hi Ladia- yes, any non negated mention within 5 months following the first falls in that PNA event, then the next non negated PNA mentions would be eligible to again start a new window and declare a new PNA event.  Please let me know if this helps.  Thanks, Andrea

Hello, for exclusions, is a "bin" defined as either Group A or Group B conditions? Or is a "bin" each specific condition (e.g., "Addiction" is a separate bin from "Alcoholism" even though both are listed under Group A)? In other words, does a person need two codes for e.g. Addiction or just two codes for any of the conditions in Group A? Also, the possible values listed in DD4 for  EXCL_BIN does not contain all possible categories listed in the Exclusions Table in the Algorithm document. Is that intentional? Thank you.

Hi Ladia- a bin is within a group- so yes, 'addicition' is the bin that would require 2 codes to hit an exclusion in group A, not 1 from addiction and 1 from a different bin within Group A.  

Good catch on the exclusion DD4- other sites have just added the other categories from the algorithm document in when completing dd4- please do the same and we will go back and change the dd file.  


Thank you


We did NLP on the radiology notes and I wanted to make sure we should include the event as "positive" if it says something along the line of "representing atelectasis or pneumonia​"



Thanks anika- yes 'representing' would be non-negative.  we used the negex package and only took out definitely negated things as negation.  Thanks

For data dictionaries 2 and 3, controls will have missing data for all codes since they have no "events" and thus no time frame within which to count the presence of codes. As such, is it acceptable to leave them out of the tables altogether since you'll have all their data in DD1? Please confirm or clarify; the documentation is currently unclear and conflicting.

Yes, data dictionaries 2 and 3 are not required for controls, as they are not mentioned in the algorithm under the section labeled controls, they are not required.  We can try to make that more clear.  Thanks


When finding if a patient is admitted for hospitilization, the pseduocode says to find closest to Time 0 for each event. Is there a time limit (ie: if only hospitilization is 10 years before or after do you still want those hospitilizations?). In other words, when do we say admitted=no for a given event.

What if you have 2 events and the same hospitilization is the closest one to each one. Is that okay?

Thanks Anika- the hospitalization was intended to be only reported if they were hospitalized at the time identified as Time 0- so if Time 0 was January 3, and hospitalization started January 1, then that is hospital day 3, and if they were discharged January 10, that is a 10 day hospitalization.  Does that help?  Thanks