Ovarian/Uterine Cancer (OvUtCa)

The KPWA/UW-led ovarian/uterine cancer phenotype has been validated at Mayo Clinic, the secondary phenotype development site.  Validation results at both the primary and secondary sites were strong and the phenotype is ready for network wide implementation.  The pseudo code document posted 11/30/2017 is correct as is and should be used by network sites for phenotype implementation.  A validated data dictionary of covariates for this phenotype will be added to PheKB by 2/15/2018, but sites are encouraged to begin implementing the phenotype algorithm now.

 

Suggested Citation

KPWA/UW. KPWA/UW. Ovarian/Uterine Cancer (OvUtCa). PheKB; 2017 Available from: https://phekb.org/phenotype/884

Comments

Submitted by Wei-Qi Wei on

Can you upload the DD so we can precede with the implementation?

thanks

Is it possible to also update the document with the final list of covariates? The current one mentions that sites should not proceed until it has been updated.

It would also be helpful if the tables were uploaded as spreadsheets and not just embedded in the document, though each site can do that locally if it's infeasible.

Thanks!

Any updates on when the Data Dictionary will be delivered? The description says 2/15/2018, but we're beyond that.

Submitted by Xinnan Niu on

I am thinking there is an error in the row 42 of the Table #1, which uses an not existing icd9-cm code 220.0. It shoud be removed and the corresponding icd10 codes, D27.0,1, and 9, should be moved up and mapped to icd9-cm 220 (exact). please let me know if comments. 

Thank you for catching this error.  The non-existant ICD-9 code (220.0) should have been 221.0.  This has now been corrected (and the ICD-10 codes D27.* are associated with the correct descriptions).  -David

On March 6, 2018 we released the data dictionary for this phenotype.  We also updated the pseudo code document, which contains information referenced by the data dictionary.  Data dictionary measures of menopausal hormone therapy (MHT) are not fully defined, but we will define them as soon as possible and update the data dictionary at that time.  PLEASE COMMUNICATE ANY ISSUES YOU MAY DISCOVER IN THE DATA DICTIONARY VIA PHEKB COMMENTS FOR THIS PHENOTYPE; we willl respond as soon as possible.  Thank you, -David

The Ovarian/Uterine Cancer phenotype is now ready for network-wide implementation.  An updated pseudo code document and data dictionary were posted to PheKB  on March 6, 2018.  As noted in the data dictionary, covariates for menopausal hormone therapy (MHT) are not fully defined; we will update the data dictionary with the MHT definitions within the next few days.  In the meantime, the algorithm and remainder of the covariates are ready for implementation.  If you have any questions or concerns about any part of the algorithm or data dictionary, please post them in the Comments section of PheKB for this phenotype.  (You may also email your questions to BOTH David Carrell (carrell.d@ghc.org) AND Aaron Scrol (scrol.a@ghc.org).)

At the bottom of page 5 of the algorithm (OV_CA_Pseudocode_2018_03_06.docx) you mention presence of and age at hysterectomy/oophorectomy/salpingectomy procedures for censoring controls. I do not see any of these fields being captured in the data dictionary (OV_UT_Ca_Data_Dictionary_2018_03_06.csv). Do you still wish to add these to the data dictionary? 

In table 4 of pseudocode you are referencing malignant neoplasms of female breast. For ICD10 codes you are not including the left/right breast codes, only the unspecified. Do you wish to include the specific left/right breast ICD10 codes?

C50.111 Malignant neoplasm of central portion of right female breast.

C50.112 Malignant neoplasm of central portion of left female breast.

(i.e. C50.11* instead of C50.119*, C50.21*, C50.31*, C50.41*, C50.51*, C50.61*, C50.81*, C50.91*).

Thank you for catching these errors, Jim.  We will be posting another version of the pseudo code and data dictionary today to address these issues.  -David

Submitted by Xinnan Niu on

It catches my attention that the updated OvUtCa Phentype Psedo Code changes the 3 types of cancers to 5. However, there is inconsistency between page 3 and page 4 from the updated psedo code, which might cause confusion and misleading. Here, I quote the sentence from page 3, " Flag variables are created to sepcify for which of each of these three types a case qualifies". In page 4, the sentence is " have a diagnosis of a qualifying cancer (ovarian,uterine,peritoneal,fallopian,or endometrial)".  In addition, here I have further comments about my previous question. we do have patients billed with codes, icd9, 182 and icd 10, C54, from which the two codes used for billiing patients with "Malignant neoplasm of corpus uteri" but there is no such two codes from your case definition. Just wonder if we have to ignore these two codes ?

Thank you, Xinnan, for catching the inconsistent description of the number of cancer types (the true number is five) and the omitted diagnosis codes (both ICD-9 182 [exact match] and ICD-10 C54 [exact match]) should be qualifying codes for endometrial cancer.  We will update both the pseudo code and the data dictionary later today with these corrections.

Cheers,

David

Hi All,

We appreciate your eagerness to implement the Ovarian/Uterine cancer phenotypes.  We want you to give you a heads up that, in addition to corrections for the above-noted errors/omissions, later today (3/9/2018) we will be posting an updated version of the data dictionary that also includes definitions for the menopausal hormone therapy (MHT) covariates, updates the oral contraceptives covariate definitions, and includes covariates for hysterectomy, oophorectomy, and salpingectomy (which are described in the pseudo code but were inadvertently omitted from the data dictionary).  Your patience is much appreciated.

Cheers,

David

Today (3/9/18) we post an updated data dictionary and pseudo code documents, addressing errors discovered by some of you (thanks much!), or that remained incomplete in the 3/6 version.  There is now a listing of covariates in the pseudo code document (Table C, beginning on page 6).  Red font is used to indicate covariates that were revised or added.  Please continue to reach out if you have any questions or concerns.

We recognize it is somewhat cumbersome to extract lists of codes from Word tables, and that there are many such code lists for this phenotype.  However, we have chosen not provide Excel versions of these code lists because of unhappy experiences in the past using Excel for such purposes; in attempting to automatically "improve" formatting of cell contents Excel sometimes removes critical parts of codes.  We hope you understand.

Thanks,

David Carrell (carrell.d@ghc.org)

Aaron Scrol (scrol.a@ghc.org).)

Thank you for catching this mistake.  As you indicate, the code should be 183.4 (NOT 184.4).  This has been corrected in the 4/18/2018 version of the pseudo code (soon to be posted to PheKB).
Thanks!
David

In data dictionary from Mar 09 2018, you are capturing oral contraceptive exposure as follows: oral_contra_expos_16_30 and oral_contra_expos_31_45 as number of years with any dispensings, and capturing the remainder of the oral contraceptive fields as number of calendar quarters with any dispensings. Just want to make certain this is your intent.

Thanks for catching this mistake.  They should all be measures in years.  We will update the data dictionary and re-post.

David

Would you mind giving an example of the following, using a code or subgroup from Column D:

To avoid rule-out codes we sometimes require diagnoses to satisfy a rule of "2/30" by which we mean a particular code (or small subgroup of similar codes) must appear at least twice over a period of at least 30 days. Note that when applying the 2/30 rule only codes of the same subgroup qualify for satisfying the 2/30 rule.

what are the subgroups of column D?

Thanks,

Barbara

Sorry for the confusing language.  "Sub groups" was referring to the types of cancer denoted in column G.  So, a patient must meet the 2/30 rule within the set of codes corresponding to a particular type of cancer from Column G.

Does that help?

David

Can you please clarify the definition of the "2/30" rule.

Is the rule meant to be applied to each specific diagnosis code or canacer type (group of diagnoses codes)? Does a subject need to have the exact diagnosis code twice 30 days apart or two diagnoses codes within a cancer type 30 days apart?

Barbara

Submitted by Xinnan Niu on

Please let's know when you are going to re-post the pseudo code. Thanks, Xinnan

Today we post an updated version of the pseudo code for the ovarian cancer phenotypye (dated 4/18/2018).  This version corrects a typographical error in a diagnosis code in row #59 of Table 1 (revision shown in red font).  It also clarifies the "rule of 2/30" with additional text on page 3 (in red font) and in the header to Table 1 (also in red font).
Please reach out with any additional questions/concerns.
Thanks,
David

Hi All,

Today we posted an updated data dicationary (dated 4/18/2018) that resolves discrepancies identified in the 3/9/2018 version of the data dictionary.  Because it is not possible to track changes in a dictionary CSV file, here is a summary of the changes in the 4/18/2018 version:

The maximum allowed values (field MAX) for the following variables was increased to 20 (allowing for the possibility some patients had continuous quarterly exposure to these hormones during the specified 5-year periods):

MHT_Estrogen_Expos_45_49
MHT_Progest_Expos_45_49
MHT_Estrogen_Expos_50_54
MHT_Progest_Expos_50_54
MHT_Estrogen_Expos_55_plus
MHT_Progest_Expos_55_plus

The maximum allowed values (field MAX) for the following variables was increased to 15 (allowing for the possibility some patients had continuous annual exposure to these preparations during the specified 15-year periods):

Oral_Contra_Expos_16_30
Years_Outpat_Encs_16_30
Oral_Contra_Expos_31_45
Years_Outpat_Encs_31_45

The maximum allowed values (field MAX) for the following variables were changed to reflect maximum values consistent with the variables as defined):

Oral_Contra_Expos_45_49
Years_Outpat_Encs_45_49
Oral_Contra_Expos_50_54
Years_Outpat_Encs_50_54
Oral_Contra_Expos_55_plus
Years_Outpat_Encs_55_64

Descriptions (field VARDESC) for the following variables were changed to resolve inconsistencies between the UNITS specification (which were correct) and the descriptions in VARDESC (which were incorrect; the corrected descriptions are also listed here):

Oral_Contra_Expos_45_49
Corrected VARDESC: "Number of years with any dispensings of oral contraceptives (as defined in Table 9 of the pesudo code document) while age 45-49 (each year spans birthday +365 days)"

Oral_Contra_Expos_50_54
Corrected VARDESC: "Number of years with any dispensings of oral contraceptives (as defined in Table 9 of the pesudo code document) while age 50-54 (each year spans birthday +365 days)"

Oral_Contra_Expos_55_plus
Corrected VARDESC: "Number of years with any dispensings of oral contraceptives (as defined in Table 9 of the pesudo code document) while age 55 or older (each year spans birthday +365 days)"

Please reach out with any questions.

David

This addresses questions in an email sent directly to me (rather than posted here on PheKB) from Xinnan Niu at Vanderbilt.  I am posting the emailed questions asn our answers so that all may benefit.

Issue #1:

For covariates below, I will choose use the most recent preceding measurement for case. For control, I will follow Harvard’s sheet, which using missing ‘.’ instead of using median or the measurement of those controls during their last visiting.

Weight_kg_Ptype_Ca_Free_Last
Height_m_Ptype_Ca_Free_Last
BMI_Ptype_Ca_Free_Last

KPWA's answer: Please do not do this (using missing for controls).  The description field in the data dictionary clearly states that these measures are to be operationalized based on each subject's age when last known to be cancer free (see the variable description in column VARDESC for measure Age_Ptype_Cancer_Free), which is defined for both cases and controls.  Only use the specified missing value (".") if there are no data for the subject satisfying the specified criteria.

Issue #2:

For covariate, CA125_Above_Normal_Ever, there are two values, “0” for never above normal and “1” for above normal. For individuals who have this test, we can assign “0” or “1” but the problem is there is one scenario, from which the individuals from either case or control might not have this lab test. If getting this scenario, what value should we assign 0 or 1 ? I am going to leave it as “.” but it will violate the DD.

KPWA's answer: Please do not use missing value codes (".") for this measure as there is no scenario in which this is necessary.  The data dictionary defines measure CA125_Above_Normal_Ever as "Ever known to have a measurement in blood serum for cancer antigen 125 (CA125) that was above the relevant local lab's normal high threshold."  If, according to your data, a subject has not had this lab test, then this measure is legitimately coded as 0, which would indicate the patient was not known to have had an elevated lab value.  Here, a normal lab value and the absence of a lab value are both coded as zero for this measure.

Issue #3:

For covariate, HPV_Status, HPV_Status_Age, I am not able to prepare them because we don’t have the list of LOINC identified from our EMR. If using regular expression to do string search, only one record return. So, I am going to fill the symbol “.” for these two covariates.

KPWA's answer:  Perfect--this is reasonable and consistent with the data dictionary. 

Cheers,

David

Submitted by Xinnan Niu on

Thanks David for your comments ! really appreciate but I am still confused with defining control data and need to further confirm with you. In your definition, you said "For CASES this is the age (integer years) at the subject's last birthday before qualifying as a case per pseudo code Table 1 (i.e., the minimum of the Case_Age? measures for cases); for CONTROLS this is age (integer years) at last known follow-up". To collect weight, height, bmi for case, we can use the earliest available measurement at that age (the age when last known to be cancer free).  if not available, use the most recent preceding measurement. Obviously, for case, it is easy/clear for us to  determine at what timing point to collect/pull measurements. However, for control,  what a timing point should we choose ?  the last known follow-up ? If not, there are mutilple available measurements for a subject from control group. I need to confirm this. Thanks !

Submitted by Xinnan Niu on

Hi David, I just posted 2 questions here because some of questions you answered in one of your posts. Here, I just post 2 questions.

*. MHT_Estrogen_Expos_45_49:
In your DD, your definition is "Number of calendar quarters with any dispensing of menopausal hormone therapy (MHT) containing ESTROGEN (as defined by Rule A in Table 11 of the pesudo code document) while age 45-49"
My question: The definition is described as "number of calendar quarters" and I calculated and count it but the value is defined as binary values "0" or "1". So, I need to clarify/confirm, Do you want the number of calendar quarters or just know if an individual who was exposure or not ?

*. Lower_GI_Ca_Ever;
In your DD, your definition is "Ever known to have been diagnosed with a cancer of the LOWER gastrointestinal tract (as defined in Table 4 of the pesudo code document) followed by another diagnosis code from the same set 30-730 days later (a rule-of-two)"
My question: I need to confirm with you about the "rule-of-two" and "another diagnosis code from the same set 30-730 days later".

My understanding of the "rule-of-two " is defined as an individual to be qualified as a case, who should have at least two Dx codes (icd9 or 10) from the same set of codes and the time interval of those two codes should be within a time frame of 30-730 days, am I correct?

Hi Xinnan Niu,

Regarding your first question:

Thank you for discovering a discrepancy in our data dictionary.  The "VALUES" column for measure "MHT_Estrogen_Expos_45_49"  should have read ".=Missing" to be consistent with the values of this measure, which is a couint of the number of calendar quarters in a five-year period (per the "VARDESC" column.  As correctly noted in the data dictionary, values of this measure range from 0-20, per the dictionary's "MIN" and "MAX" columns.  Please allow this measure to reflect the count of calendar quarters.

This same discrepancy was present for a total of six measures, all of which record counts of calendar quarters:

MHT_Estrogen_Expos_45_49
MHT_Progest_Expos_45_49
MHT_Estrogen_Expos_50_54
MHT_Progest_Expos_50_54
MHT_Estrogen_Expos_55_plus
MHT_Progest_Expos_55_plus

These discrepancies have been corrected in the June 5, 2018 version of data dictionary, posted to PheKB today.  Please use the 6/5/18 dictionary.

Regarding your second question:

Measure Lower_GI_Ca_Ever should be operationalized as a "rule of two," as described in the dictionary, which you correctly summarize.  However, please note that the rule of two applies to both cases and controls (not just cases, as your post suggests).  Please reach out by phone (206-287-2705) if it would be helpful to discuss this.

Thanks much,

David

Hi David, assuming that high normal value and reference high value mean the same thing, Geisinger does not record a reference high for the cancer antigen 125 test. How do you suggest I handle this?  Thanks!

Hi Ken,

Do your CA 125 lab data indicate when this lab is "abnormally high" or something comparable?  If so, I belive you would still be able to populate three of the five CA125 measures (listed here):

CA125_Above_Normal_Ever
CA125_Quotient_Max (REPORT AS MISSING (".")  IF "NORMAL HIGH" NOT AVAILABLE IN LAB DATA)
CA125_Measured_Value
CA125_Normal_High_Value (REPORT AS MISSING (".")  IF NOT AVAILABLE IN LAB DATA)
CA125_Above_Normal_Age

Is that feasible?  If not, please follow up with another post or call me to discuss (206-287-2705).

Thanks,

David

Thanks for calling, Ken.  I should clarify that the measure CA125_Normal_High_Value is intended to capture the upper limit of what is considered a normal value for this lab (so it may be confusing to refer to it as a "normal high").  Nevertheless, CA125_Normal_High_Value is the value above which the CA125 is considered abnormal.  Thanks for helping us clarify this.

Because you know that this reference value for all of your CA125 labs happens to be 35 (which may not be the correct reference for other eMERGE sites), please populate CA125_Normal_High_Value with 35, and use it to calculate CA125_Quotient_Max.  This will allow you to populate all five of the CA125 measures.

CA125_Above_Normal_Ever
CA125_Quotient_Max
CA125_Measured_Value
CA125_Normal_High_Value
CA125_Above_Normal_Age

Cheers,

David

Hi again! I think you might have a typo in your pseudocode document. We have [10334-1] as the LOINC code for "Cancer Ag 125 [Units/volume] in Serum or Plasma" instead of  [10344-1]. Thanks!

Thank you, Ken, for catching this typographical error.  The correct LOINC code is 10334-1.  This has been corrected in the June 5, 2018 update to the pseudo code, just posted.

Cheers,

David

Hi All,

In response to errors/discrepancies posted today on this forum we have revised both our pseudo code and our data dictionary, and posted these updated documents to PheKB.  Both documents have suffix "2018_06_05" in their file names.

The pseudo code revision corrects a typo in a LOINC code for CA125 (correct value is 10334-1) in Table 6 on page 29 of the pseudo code document.

The data dictionary revisions correct the VALUES column of the following six measures to allow them to capture integer values within the (correctly) specified range (rather than forcing these to be binary 0/1 values):

MHT_Estrogen_Expos_45_49
MHT_Progest_Expos_45_49
MHT_Estrogen_Expos_50_54
MHT_Progest_Expos_50_54
MHT_Estrogen_Expos_55_plus
MHT_Progest_Expos_55_plus

Cheers,

David

Submitted by Xinnan Niu on

Hi David,

The values defined for the 6 covariates in your newly uploaded DD (version, 2018_06_05) are still 0 or 1, meaning we will get errors if uploading the implemented data to PheKb beause the values in that fields is defined as 0 or 1 but some of loaded # of quater values are more than 1, which will cause error message been throwed.

Thanks,

Xinnan

Oh my!  Yes, somehow the edited version was not preserved.  A revised data dictionary has juse been posted, dated 6/6/2018.  In it the specificaitons for these measures are correct.

Thanks

Submitted by Xinnan Niu on

Hi David,

Thanks for your confirmation to collect  bmi, weight, and height by using last known follow-up as the timing point. However, in our case, there are many missing data if using last known follow-up (based on encouter/visiting information). However, we won't get many missing data  if pulling out this measurements based on their latest meausrement time (some of them are overlapped with last known follow-up). Please let us know which one you prefer?

Yes, as noted in the definitions of the height, weight, and BMI measures, if a measurement is not available on the preferred date you are instructed to "use the most recent preceding measurement".

Table 5 ICD-10-PCS OU57_ZZ, OUL7_ZZ, OUT5_ZZ, OUT6_ZZ, OUT7_ZZ codes should start with number 0, not letter O. 

--Sunny

Oh my!  Nice catch Sunny!  We just posted and updated data dictionary (dated June 7, 2018) repairing the code errors in Table 5.

Thanks,

David