Clostridium Difficile Colitis

Clostridium difficile, also known as "C. diff," is a species of bacteria that causes severe diarrhea and other intestinal disease when competing bacteria in the gut have been wiped out by antibiotics (see Wikipedia entry). In rare cases a C. diff infection can progress to toxic megacolon which can be life-threatening. In a very small percentage of the adult population C. difficile bacteria naturally reside in the gut. Other people accidentally ingest spores of the bacteria while patients in a hospital or nursing home.


True positive (for Gold standard cases): C. diff antigen / antibody positive or colonoscopy/flex sig positive for C. diff.

True positive (for Silver standard cases): the doctor believes they had C. diff colitis and treated them with an appropriate course of oral vancomycin (vancocin) or metronidazole (flagyl)


Phenotype ID: 
Type of Phenotype: 
David Carrell and Josh Denny
Date Created: 
Tuesday, March 13, 2012

Suggested Citation

David Carrell and Josh Denny. Group Health and Vanderbilt. Clostridium Difficile Colitis. PheKB; 2012 Available from:


Submitted by Josh Denny on

Case definition modification (4/19/2012):

For cases, delete this path from the algorithm: Any C.Diff test positive = No --> ICD 008.45 = Yes


Design so that we have "gold" (with antigen results) and "silver" (EMR text records suggesting diagnosis) case definitions


NU's implementation numbers:

Cases (Toxin+ or NLP/ICD9): 83 total

     Gold: 71?

     Silver: 12?

Controls: at least 317

A correction to the algorithm posted 15 August: The proton pump inhibitor exclusion should NOT be an exclusion - proton pump inhibitor medications are a covariate only, not an exclusion.

1. Regarding Rx_transplant; DD says 'Flag indicating subject was exposed to transplant medications

Time: time 0* to index_date + 7 days' but the transplant meds tab says 'up to 21d before index_date'.

Can you clarify?


2. Cancer; DD says 'Flag indicating subject has cancer diagnosis (excluding non-melanoma skin cancer)'

Do you mean the list of hematological malignancy codes? If not, do you have a list of relevant codes?



1. For cancer, I am pulling dx ICD 9 codes 140-172.99, and 174-239.99 (excluding 173.xx non-melanoma skin cancers). If this is not correct please send a list of appropriate codes.

2. Do you want cancer (excl non-melanoma skin cancer) EVER or prior to index date? I am showing 84% prevalence in my c. Diff cases and controls (EVER).



Submitted by Josh Denny on

1. I would use 140-172.99, and 174-209.99, ignoring carcinoma in situ and benign neoplasms.


2. Do not exclude patients with cancer codes, just list as a covariate.  I would limit to within 5 years of the diagnosis. (I updated the data dictionary with this info.)


3. Use the dates for transplant meds as described.

The algorithm has exclusions for bone marrow cancer for both cases and controls, so this is separate from cancer as a covariate? All we've used so far are codes for bone marrow cancer.

excl_date - Not sure what to do with this field - if I've already excluded subjects, they're not going to be in the data we send. Is this looking for the first date they had cancer/chemo/etc? If so, I'll need to know which of the covariates would potentially be exclusions and would therefore provide an excl_date.


I can answer one of these -

The bone marrow cancer exclusion for is separate from cancer as a covariate – for the covariate, please use codes ICD9 140-172.99, and 174-209.99 (this was updated in the data dictionary last week in response to another question)


I have asked around at Vanderbilt and the consensus is to ignore the exclusion date.

Another question - the PPI exposure says -21 days in the algorithm, but no time is listed on the data dictionary - does that time restriction still apply?

Can you also clarify whether the chemo field is for the 180 days prior/+7 days to the index date as the algorithm says, or for all time as the data dictionary says?


For both of these (PPI and chemo), there is an exclusionary time and a variable evidence of them ever outside of the exclusionary time. So the exclusionary times apply for excluding, but if included, collect evidence of exposure to them. 

According to the algorthim there is no exclusion for PPI.

For rx_proton_pump_inhibitor and for chemotherapy covariates can you please add the specific time periods that you are interested in to these fields in the data dictionary?

Also noted is that there are different exclusion time periods for chemotherapy (cases versus controls).

Cases (-180 days to index_date + 7 days) and Controls (-90 days to index_date). Is this correct?

Sorry for confusion - PPI was removed as an exclusion, so the dates 


Chemo Exclusion time for cases and controls should match and should both be:   (-180 days to index_date + 7 days) 

PPI is NOT an exclusion - sorry for my mistake!

The time period does apply - I'll upload a dd soon with that consistently represented

This comment is being made on behalf of Bahram Namjou-Khales at Cincinnati:

One of the exclusion criteria as you know for adult C-Diff  is presence of chemotherapy and/ or  HIV. In pediatric cases >2 years old , there are additional immunodeficiency conditions that we don’t usually see in adults for example  genetic immunodeficiency syndrome ( SCID ) or aplastic anemia etc... I think we should add immunodeficiency syndromes  or aplastic anemia as another exclusion criteria  in pediatric population. Opinions?

Submitted by Josh Denny on

Not being a pediatrician, a little hard to comment, but it seems a good idea.  How about excluding the following ICD9s:

  • 279.* (SCID, Agammaglobinemia, etc)
  • 284.* (aplastic anemia)
  • 288.0, 288.1, 288.2, 288.4, 288.5 (various leukopenias)

To keep things simple, you could see the impact of just excluding if a patient ever had one of the diagnoses.  If it excludes a lot of patients, we could revise.

There are three date fields defined in the first tab of the data dictionary: index_date, discharge_date, exclusion_date.  This presents a problem for us as the closest data we have to a subject date is year of birth. 

All EMR events are tied to age in days so we are able to apply the time range constraints in the algorithm.

We could assume each subject's birth date is 1/1 or 7/1, for example, to get an approximate date.  Is this acceptable, or will a different approach be more useful?

Hi Frank - your plan to assume a birth date for each subject and calculate an index_date based on that and the age (in days) of the EMR event (that's my understanding - please correct me if I have misunderstood).

It  might make most sense to use 7/1.


~~~ sarah 

Thanks Sarah.  Another question - perhaps I've overlooked the answer - what two leading digits should we use for the CHOP subjid?

All covariates specified in the data dictionary are necessary for this phenotype because all are used in analyses and it is important that analyses based on new subjects match analyses that have been performed for other subjects.  You will find the data dictionary in the files section of the main page for the phenotype.  The data dictionary is named CDiff_Data_Dictionary_ForNetwork_30August2012.xlsx.


Hi, I could not upload our data sets due to the lack of data dictionaries to choose from. What should we do?

We are trying to upload the data sets and notice something in data dictionaries needs to be revised.

For Sex, C46109 (Male) and C42110 (Female) are the values for gender. It seems in this data dictionary, C46119 is used.

Some of the fields are only for cases, it would be good to add NA = "not assessed" to avoid errors. For example, Clostriudium_difficile_casetype.

Could you revise the data dictionaries and we can try to upload the data?




Thank you for identifying these issues, Hongfang.  A revised dictionary has been uploaded.  It corrects the typo in the code indicating male sex, and adds "9=Not applicable" to the code set for Clostriudium_difficile_casetype.  Given that variable AGE_at_Clostridium_Difficile is data type "Integer, Calculated Value," I was not able to add a code indicating this field is not applicable to non-cases.  Please leave the latter field blank for non-cases.


Our bmi/ht/wt minimums go lower than allowed, they are 11.79/46 cm/5.8 kg and look legitimate. Another concern with these numbers is that for a few of the older data we don't have ht/wt data within a year of the index date, sometimes much longer.  For kids this can obviously make a lot of difference - should we exclude if we don't have a measurement within a certain time period (or we could report the difference in age)? In general can we leave blank any fields where we don't have data for a patient?  Other examples are days from discharge date and antibiotic risk for cases - thanks.

Sorry for the delay in responding.  Yes, because your data are from a pediatric population, please feel free to disregard the lower boundary for BMI values.  And please leave blank any BMI values for which you do not have a corresponding height measurement within the time frame specified.  Same for discharge date and antibiotic risk.