Cancer Moonshot Data Documentation

Graph Cancer Moonshot dataset details how cancer patients enter and exit hospitals. This data is aggregated Medicare data starting in 2010 and ending in 2015, inclusive, so six years worth of data.

All data respects the 11 patient threshold rule. Data rows where less than 11 patients were included in the 6 year time period were excluded from this data release.

Currently we are using a traunching method, in order to enable us to release several versions of the data without being concerned about different versions of the data being used together to view "small cells" of patient data.

This first release is the most aggressively traunched version. Here when there are under 100 patients, we replace that value with 100. When there are over 100, but less than 500, we write 500.
When there are over 500 but under 1000, we write 1000.
When there are over 1000 but under 5000, we write 5000.
When there are over 5000 we write 10000...
patient counts over 10000 are very rare.

This type of traunching can make simple aggregate statistics very difficult, but frankly that is not the type of analysis that this data was intended to enable. Its time span is large enough that any value under 100 is a relatively rare event, and we give some general "orders of magnitude" analysis of scale differences. This data represents a very complex graph structure. Using this data, it is possible to compare the graph structures between different hospitals, to determine if the structure itself is different.

Later versions of this data release will have more specific numbers, but only after we have confirmation from the community that the structure of data that we are releasing is optimal. At that point, it will be possible to perform both a structural and statistical analysis.

Up until the recent shift to ICD10, ICD9 codes were used to detail how patients were diagnosed when they entered a hospital. While a given patient can potentially have several diagnosis codes that cover a hospital visit there are two that are specifically highlighted, the “primary” diagnosis and the “admitting” diagnosis. The primary diagnosis is what the hospital expects to be compensated for, (i.e. the “real” diagnosis that justifies the visit to the hospital). The admitting diagnosis is the diagnosis given to the patient to justify the admission. Usually, the primary and admitting diagnosis are the same thing.

The first file details the relationships, on a per organizational NPI basis, between the admission and primary diagnosis codes. Usually, when a patient visits a hospital, the admission code and the primary code. This is intuitive, if a patient is admitted for the treatment of breast cancer, usually she will be treated for breast cancer. However, if the patient is admitted for heart disease, but is ultimately treated primarily for breast cancer that could be interesting. We have provided a list of every admission and primary diagnosis codes, where the primary diagnosis code was ultimately an ICD9 code in the neoplasm category.

The second and third files detail how admission diagnosis codes relate to the other coded information that CMS has about these hospital admissions, specifically their “Claim Inpatient Admission Type Code” and “Claim Source Inpatient Admission Code” which detail the specific source of an admission and the type of admission. We have included additional data files that detail the meaning of these code values.

The fourth file details the relationship between the patient diagnosis code and coded discharge data. Like the 2,3 files, we have included a csv file that details the meaning of the discharge codes. These codes indicate things like “discharged to home” vs “discharged to skilled nursing facilities.

This is a beta release of this data set. We would appreciate feedback on the usefulness of these data for research purposes. The license of this data set is the Creative Commons Attribution No commercial No derivatives license 3.0 or later available here: https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode



########################################################################################## ########################################################################################## ###### ###### ###### File Description / Detail ###### ###### ###### ########################################################################################## ##########################################################################################

admit_code.csv          - contains admitting code and their description
--------------------------------------------------------------
Number of Records: 7
type_adm           - Admit code
short_description
description


admit_source.csv          - contains referral source
---------------------------------------------------------------
Number of Records: 17
src_adms           - Referral code
short_description
description


status_code.csv           - contains discharge code
---------------------------------------------------------------
Number of Records: 45
stus_cd           - Discharge code
short_description
description


moonshot_admitting_code_to_admitting_diag_1015.csv          - contains admitting code to admitting diagnosis map
------------------------------------------------------------------------------------------------------------------------------
Number of Records: 24,849
Number of Distinct NPIs: 2,209
organization_npi           - Organization NPI
claim_inpatient_admission_type_code           - Admitting Code (admin_code.csv)
claim_admitting_diagnosis_code          - Admitting Diagnosis code (ICD-9)
patient_count           - Number of Patients
claim_count           - Number of Claims


moonshot_admitting_source_to_admitting_diag_1015.csv          - contains admitting source to admitting diagnosis map
------------------------------------------------------------------------------------------------------------------------------
Number of Records: 25,354
Number of Distinct NPIs: 2,194
organization_npi           - Organization NPI
claim_source_inpatient_admission_code           - Admitting Source (admit_source.csv)
claim_admitting_diagnosis_code           - Admitting Diagnosis code (ICD-9)
patient_count           - Number of Patients
claim_count           - Number of Claims


moonshot_admitting_diag_to_primary_diag_1015.csv          - contains admitting diagnosis to primary diagnosis map
------------------------------------------------------------------------------------------------------------------------------
Number of Records: 28,189
Number of Distinct NPIs: 2,164
organization_npi           - Organization NPI
claim_admitting_diagnosis_code           - Admitting Diagnosis code (ICD-9)
primary_claim_diagnosis_code           - Primary Diagnosis code (ICD-9)
patient_count           - Number of Patients
claim_count           - Number of Claims


moonshot_primary_diag_to_discharge_code_1015.csv          - contains primary diagnosis to discharge code map
------------------------------------------------------------------------------------------------------------------------------
Number of Records: 37,335
Number of Distinct NPIs: 2133
organization_npi           - Organization NPI
patient_discharge_status_code           - Discharge Code (status_code.csv)
primary_claim_diagnosis_code           - Primary Diagnosis code (ICD-9)
patient_count           - Number of Patients
claim_count           - Number of Claims