A pheontype defining patients with strong evidence of having been diagnosed with colorectal cancer (cases) and patients who clearly do not have such diagnoses (controls). This phenotype is being used for sequencing studies. The only NLP involved in this phenotype is a very simple string search applied to pathology reports.
List on the Collaboration Phenotypes List
Type of Phenotype:
Tuesday, March 1, 2016
Owner Phenotyping Groups:
View Phenotyping Groups:
David Carrell and Jane Grafton. Group Health/UW. Colorectal Cancer (CRC). PheKB; 2016 Available from: https://phekb.org/phenotype/514
To identify screened controls, you require the patient to have at least one procedure code for a colonoscopy (6.1.2), and, for the not screened controls, to have no evidence of having received a colonoscopy (6.3.1). Both of these requirements reference table 6.1, but I do not see this in the document. Can you please advise? Thanks!
My apologies; Table 6.1 was inadvertently omitted from the pseudo code document. Will correct than and repost the pseudo code.
An updated version of the pseudo code, dated 8/31/2016, has been posted. It includes the (previously omitted) Table 6.1. Please note that there will be some changes to Table 6.1 within a few days, but you may begin programming to it now. Some of the codes in the list will probably be excluded.
Data dictionary updated 9/7/2016
An updated version of the data dictionary (CRC_DataDict_PersonLevel_2016_09_07.csv) was posted today. This version provides descriptions for a few variables needed to operationalize them that were missing from the prior version. We believe the dictionary is now complete, but please reach out with any questions.
Updated pseudo code with updated Table 6.1
Updated today is the pseudo code document (GH_UW_CRC_Ptype_Pseudocode_2016_09_07.pdf) now including an updated list of codes used to identify relevant CRC screening procedures. Note that this revised set of codes is a subset of codes originally posted 9/1/2016 (dated 8/31/2016). Let us know if you have any questions or concerns.
During validation, we noticed that ICD-9 code 555.9 is not included on the exclusion list for Crohn's/UE. We are finding cases who do have this ICD-9 code and Crohn's present in their charts. Should we keep these patients in?
Thank you for catching this issue, Katie. ICD-9 code 555.9 should be added to the list of codes listed in Table 4.1 of exclusions. We do not want subjects with this code to be eligble cases. We will update Table 4.1 in the pseudo code document for CRC and re-post it to PheKB asap.
Ready for network implementation
The CRC phenotype is ready for network-wide implementation. The pseudo code was updated as of today (adding the one overlooked ICD-9 code Kathryn found--thanks, Kathryn!). Also added was a list of ICD-10 codes, but these are unlikely to be of much use as they have not been validated and can't be validated at Group Health/UW (see note inside document "GH_UW_CRC_ICD-10_Codes_2017_01_30.docx").
I noticed in the data dictionary for covariate "Cancer_Dx_Age", there is the requirement to exclude squamous or basal cell carcinoma of the skin, melanoma in situ, carcinoma in situ of the colon or rectum, or carcinoma in situ of the cervix. Are ICD-9 codes available for this?
Additionally, are there codes available for the Radiation to the pelvis covariates?
Codes for radiation-involving imaging of the pelvis/abdomen
We are assembling a list of procedure codes for radiation-involving imaging of the pelvis/abdomen.
Question REL the DD: for NSAIDS
The DD (data dictionary) says “Age in years at earliest NSAID prescription fill (requires local site definition of all NSAID medications)” and “Count of days on NSAID medications (using days' supply from medication fills)”. We don’t have a local definition, and we don’t have access to medication fills, yet these are required fields. How do you suggest we proceed? Do you have a list of NSAID meds by generic name, and would a prescription for these or their existence in the (home or current) medication list, where they exist, be good enough (as that’s all we would have)?
Working on these questions abou the CRC data dictionary (DD). will update PheKB when we have updated the DD. Thanks for your patience.
Revised data dictionary April 5, 2017
Please note that a revised data dictionary has been provided today (CRC_DataDict_PersonLevel_2017_04_05.csv). This dictionary provides additional details for covariates Cancer_Dx_Age, NSAID_Age, and NSAID_Days_Supply. Please reach out with any questions.
Are there any codes available for the radiation of the pelvis covariates (Rad_Pelvis_Age, Rad_Pelvis_Days)?
Dropping covariate for radiation of the pelvis
We have decided *not* to include among the covariates for the CRC phenotype a measure of exposure to radiation of the pelvis. We will update the data dicationary to reflect that this measure has been dropped.
PS: We are also providing a separate data dictionary to capture BMI repeated measures.
Question regarding heights, weights, BMI in DD
Your data dictionary indicates that you want height, weight, and BMI as repeated measures.
Do you want all heights, weights, and BMIs for each subject?
If so, do we send this as a seperate file and include the following fields: emergeid, bmi_age, height, weight, bmi.
If you only want one BMI measure, at what timepoint do you want the measure from?
Question regarding heights, weights, BMI in DD
Thank you for this question. We do need repeated measurements. For simplicity, please provide all avilable measurements (going as far back as is reasonable given your data environment). And, yes, please do provide these measurements as a separate file, including EMERGEID, BMI_AGE, HEIGHT, WEIGHT, and BMI (as defined in the person-level DD). We will post a separate DD for BMI as soon as possible (and update the person level DD to reflect this).
Data dictionaries revised, 4/7/2017
Please note that the data dictionaries for the CRC phenotypes have been revised as of 4/7/2017. There are now two dictionaries. The first is the person level dictionary (named CRC_DataDict_PersonLevel_2017_04_07.csv). It is exactly like the prior version except that the BMI variables have been removed. The second is the BMI repeated measures dictionary (named CRC_DataDict_BMI_Repeated_2017_04_07.csv). It contains the height, weight, and BMI variables that were previously (and incorrectly) included in the person level dictionary, plus fields EMERGEID and BMI_AGE to document the age at which each subject's measurements were taken. As noted in the description for field BMI_AGE, please provide age with two digits of precision to the right of the decimal point.
Reach out with any questions.
CRC data dictionaries updated
This message is to confirm that the current (and hopefully final) versions of the CRC phenotype data dictionaries are:
The 5/2/17 person-level dictionary omits two measures related to radiation exposure of the pelvis.
Reach out with any questions,
NSAIDs and data dictionary
NSAID are included in the data dictionary however, we could not find a list of NSAIDs.
Could you please clarify if this data element (NSAID) should be included in this algorithm or not?
Thank you very much
Please see the instruction on
Please see the instruction on the row of the DD for this measure. It provides a method for identifying NSAIDs locally (which is an unavoidably local task, unfortunately).
We are assuming NSAID therapy is the abbreviation for non-steroidal anti-inflammatory drugs, correct?
If the assumption is correct, then in the category NSAIDs a group of different medications are classified under this category and some examples are listed below:
aspirin, celecoxib, diclofenac, ibuprofen, indomethacin.
Which medications should we use to generate the co-variate NSAID listed in the data dictionary?
Could you please clarify?
Thank you very much.
That is correct--NSAID is the
That is correct--NSAID is the acronym for non-steroidal anti-inflammatory drugs. Apologies for not specifying that.
The descriptions of the NSAID measures in the data dictionary describe the method we suggest you apply locally to identify drugs in this category. From the data dictionary: "Age in years at earliest NSAID prescription fill (requires local site definition of all NSAID medications; recommend including as NSAIDs all medications where the therapeutic class contains the string “NSAID” ). As noted in a prior post on this page it is, unfortunately, not feasible for any one eMERGE site to know what medications are used at any other eMERGE site, and this is the case for all medications, not just NSAIDs. Nor is it possible for us to share an exhaustive list of national drug codes (NDCs) for NSAIDs (or any other medication class) from a commercially available list (such as First DataBank) because that would violate the terms of our contract with the data vendor.
Have you considered generic names of drugs and RxNorm CUI numbers?
PS: Our site will create our list of generic names. Thank you.
Thanks, Adelaide. We did not
Thanks, Adelaide. We did not use generic names of drugs or RxNORM CUI numbers to identify NSAID medications. However, if that is the method that works best at your site (and you can defend it), please feel free to do so. Happy to discuss in person if that would be helfpul.
Rituxan used to treat Secondary Thrombocytopenia
Our chart reviews uncovered an interesting senario that turns into a case by the 5.3 rule. If a patient were to have at least one CRC dx code ever, had chemotherapy treatment related codes, and had no other cancer dx codes, that patient becomes a case. However, if that one CRC dx code was an initial dx that was later found to be something else (like a polyp) and the patient was adminstered Rituxan as treatment for Secondary Thrombocytopenia, that patient would be falsely labled as a case. How should I proceed?
Hi, Ken. Very interesting
Hi, Ken. Very interesting case. I will take this up with the PIs and post a response as soon as I have their input.
My inclination is to update the algorithm to "code around" this particular constellation of codes, but I'll get a definitive answer soon.
I'm guessing you did *not* review a large enough random sample to answer the following question, but if you did, please share what you know: Can you estimate the percentage of of algorithm-assigned CRC cases that would turn out to be false-positives because they only have Secondary Thrombocytopenia?
Thank you, again, Ken for discovering the potential for rule 5.3 of our CRC algorithm to yield false positive CRC cases--in rare circumstances. We have updated the pseudo code today (5/16/2017) to address this particular clinical pattern (presence of a CRC diagnosis code that turns out to be a rule-out code that is followed by a diagnosis of thrombocytopenia treated with Rituxan. Rituxan is also used as a chemotherapy for CRC. The revised pseudo code dated May 16, 2017 has been posted to PheKB, and the revisions are visible as tracked changes.
To any site that has already implemented the CRC phenotype: We would appreciate your checking whether any of your patients qualifying as CRC cases by rule 5.3 also had diagnoses of thrombocytopenia (using the two diagnosis codes provided in the revised pseudo code), and if so removing them from your set of cases.
Thanks! I added the change
Thanks! I added the change and the issue is resolved.
As we only had 5 cases in our
As we only had 5 cases in our eMERGE cohort that followed the rules in 5.3, that one false positive case was all I could find. I have not run the the algorithm on a larger cohort yet.
Good to know. -David
Good to know. -David
May 15, 2017 update to the CRC data dictionary
Andy Cagan discovered a flaw in the CRC data dictionary (thanks, Andy!) which we have corrected in the just-released 5/15/2017 version. The problem was that the dictionary instructed you to use the period (".") to indicate missing data for some integer variables, but periods, alone, are not acceptable values for integer variables. We corrected this problem in the 5/15/2017 version of the data dictionary by asking you to instead indicate missing values with "999" or "99999" values (depending on the valid range of integer values). This was the only change in the data dictionary.
Thanks again, Andy, for catching this error.
The table asks to exclude certain tumor histologies: is it possible to have those specified by codes? EG, 903 for malignant mesothelioma.
Hence my concerns! Thanks.
An updated 5/17/2017 version
An updated 5/17/2017 version of the pseudo code has been provided that now includes ICD-O-3 histology codes for excluded tumor histologies listed in Table 5.1 (p. 10).
5/31/2017 update to pseudo code document
Thanks to Robert Carroll's careful review of the code sets presented in the pseudo code tables (thank you, Robert!) we have identified a small number of codes that should not be included and have deleted them from the tables (see tracked changes and annotations added to the tables). These codes are for rare conditions that are not expected to impact the overall quality of the phenotype data. Nevertheless, if you have not already completed implementing the CRC pseudo code at your site, please remove the deleted codes as noted in the updated pseudo code document.
Question re case/control types
I noticed that there was no opportunity in the data dictionary to specify what type of case or type of control an individual was. This seemed especially important for controls- screened vs unscreened. Did I miss this somewhere?