Clinical Assessment Data

AMP PD harmonizes, or standardizes, similar data collected across BioFINDHBSPDBP, and PPMI. This data curation and transformation process facilitates and simplifies cross-cohort analysis. More specifically, variable names from AMP PD studies are aligned to a global mapping file and final curation is reviewed by AMP PD; this Harmonized Dictionary, based on CDISC terminology, is available as a reference for the harmonized clinical dataset and is linked with the Harmonized Assessment and Variable Matrix. Harmonized cohort data is made available in AMP PD through BigQuery.

Data Curation Workflow

Data from four different Parkinson’s Disease studies were harmonized to the same standard, curated and consolidated into one dataset using automated and manual approaches. To harmonize and standardize metadata for AMP PD project a global mapping file (Harmonized Dictionary) aligning variables between datasets was first created. CDISC terminology was used for harmonized variable names and descriptions when possible. A coding file was then created to decode numeric coded variables, clean-up and standardize medication names, diagnosis, level of education, etc., and align visit names between cohorts. After mapping and coding files were generated, an automated tool was applied to transform data files and perform integration of four datasets into one set of curated files. Manual inspection of transformed files followed each phase of automatic transformation. The content of each transformed file was approved by a curator and all needed adjustments were performed manually. Finally mapping files (dictionaries) for uploading data into BigQuery tables were produced by processing the content of the curated dataset using additional R-script.

Curation workflow represents three main steps

1. Data acquisition and review
2. Data harmonization
3. Data transformation/curation and QC
 

Data Acquisition and Review

Based on the priority assigned by the AMP PD Clinical Data Harmonization (CDH) group, the data was split into two batches: Subset 1 & Subset 2. Considerations and approach for prioritization of clinical data to be harmonized:

  • Key variables critical for interpreting biological data (e.g. demographics)
  • Variables to increase ease of use of biological data (e.g. genotype)
  • Relevance and importance to Parkinson's disease
  • Data complementary to biologic data generated through AMP PD
  • Identified as the highest priority based on collective input from research experts in the PD community

 

Data Subset 1

Output: 17 harmonized domains/categories

Data Subset 2

Output: 10 harmonized domains/categories

179 unique fields from 24 PPMI files

157 unique fields from 17 BioFIND files

197 unique fields from 14 PDBP files

155 unique fields from 1 HBS file

37 unique fields from 8 PPMI files

10 unique fields from 3 BioFIND files

26 unique fields from 2 PDBP files

42 unique fields from 2 HBS files

Data Harmonization

Harmonization cycle icon


Metadata variables were harmonized based on the data compatibility upon Clinical Data Harmonization (CDH) group suggestions, decisions, and final approval. CDISC terminology was used if available for Title and Description. Values of harmonized variables from different studies were standardized and included in the coding file. The coding file contains decodes for numeric coded variables, clean-up and standardize medication names, diagnosis, level of education, etc., and aligns visit names between cohorts.
 

Data Transformation/Curation and QC

Both automated (custom SmartConverter tool) and manual approaches were used to perform data transformations. The original data files were inspected for extended ascii characters, number of patients, visit types, codes and their decodes availability in supporting study documents. Transformation templates and coding file were prepared based on a harmonized dictionary and curation decisions to perform three rounds of transformation/consolidation using SmartConverter. After each round output files were inspected, and additional manual transformations were performed before the next round of automated transformation and after the final curation. Subset 1 and subset 2 were curated separately using the same approach described below:

Step 1: Transform Raw Data

  1. Prepare vocabularies and add to primary code file
  2. Organize data-files by study
  3. Create coding file and transformation template
  4. Run SmartConverter Round 1 and perform QC

Step 2: Transform & Consolidate

  1. Organize curated files into distinct study folders
  2. Modify transformation template
  3. Consolidate subset 1 and subset 2 categories
  4. Run SmartConverter Round 2 and perform QC

Step 3: Transform & Finalize

  1. Add and consolidate clinical data (e.g. missing diagnosis inputs)
  2. Remove and substitute fields
  3. Run SmartConverter Round 3 and perform QC

Clinical Data Validation Plan

The AMP PD Clinical Data Harmonization (CDH) team crafted a plan to further validate the results of the harmonization process. The purpose of the validation plan was to: 

  1. Ensure no new errors were introduced into the clinical data as a result of the data harmonization process
  2. Facilitate identification of records that should be excluded from the public release
  3. Identify a set of tests that can be run to validate additional data submission from the current AMP PD cohorts as well as future data submissions from new cohorts

clinical data validation tests_primary and secondaryThe CDH team constructed: 42 individual cohort tests, identified 23 unique tests to run against harmonized data from all four cohorts, and identified 19 tests that were not valid against harmonized data because of excluded or modified data points, or changes to data structures.

The following key decisions and outputs were made as a result of executing the validation plan: 

  1. Alignment of SmartConverter data outputs against program and cohort specific tests

  2. Final inclusion/exclusion release criteria for clinical data

  3. Secondary dataset(s) for further analysis and curation for potential future release

  4. Confirmed AMP PD Subject Master List

  5. Final AMP PD clinical dataset for public release

   
 

Cohort & Across Cohort Business Rules

AMP PD received cohort specific business rules from BioFIND, HBS, PDBP, and PPMI. These rules were applied by the cohorts to the raw data inputs prior to the clinical data harmonization process. As part of the QC process, these business rules were re-checked after the harmonization process to ensure the rules were still valid.

HBS Cohort Specific Data Checks

Test Description
Discordant Sex Check Reported sex should be same across multiple visits and studies
REM Sleep behavior Disorder Questionnaire Check Check RBD checklist score does not exceed 13
UPDRS total score checking Check total score does not exceed 199
UPDRS subscale score checking Check UPDRS subscale scores do not exceed the following: Section I: 16 points; Section II: 52; Section III: 108; and Section IV: 23
MMSE outlier check Check MMSE score does not exceed 30
Change in diagnosis Check consistency of diagnosis across multiple visits and studies
Medical history consistency Check consistency of medical history across multiple visits and studies (if lifetime condition reported "YES" in one visit, following visits should not be "NO")
Family history consistency Check consistency of family history across multiple visits and studies (if lifetime condition reported for family member "YES" in one visit, following visits should not be "NO")
PD risk factor consistency Check consistency of PD risk factors across multiple visits and studies (if lifetime risk reported "YES" in one visit, following visits should not be "NO")
Known pregnancy Check that pregnancy marked "N/A" in males
Height Check consistency of reported height across multiple visits and studies
Age consistency Check consistency of age, adjusting for time, across multiple visits and studies
Ethnicity consistency Ethnicity should be same across multiple visits and studies
Race consistency Race should be same across multiple visits and studies

 

PDBP Cohort Specific Data Checks

Test Description
Gender consistency check Gender should be same for the same GUID across PDBP cohorts
Ethnicity consistency check Ethnicity should be same for the same GUID across PDBP cohorts
Race consistency check Race should be same for the same GUID across PDBP cohorts
Age consistency check For the same GUID and same Visit type, age should be the same (multi enrolled subjects are exceptions)
Visit date consistent checking For the same GUID and same Visit type, visit date should be the same (multi enrolled subjects are exceptions)
Neurological Examination self-conflict checking InclusnXclusnCntrlInd' should be consistent with 'NeuroExamPrimaryDiagnos'
MoCA inconsistent with education level check Whether subject got their 0 or 1 score according to the education level
MoCA outliers check Check MoCA score higher than 30
MDS-UPDRS Part III score scale check For case, part 3 score should be >10 ; for control, part 3 score should be <=10 (0-10)
MDS-UPDRS Part III score trend check Control subject scores should decrease, while Case subject scores should increase
MoCA control check Check MoCA score lower than 20 if subject is a control
Missing form check The required clinical assessment not filled or not submitted
Retention rate check Check drop-outs per site

PPMI Cohort Specific Data Checks

Test Description
Enrollment pending Check for consented subjects who have not yet enrolled or screen failed after 2 months or more
Premature Withdrawal (PW) consistency Check for agreement in PW status across datasets: CONCL, Reportable Events (Incidents)
MOCA outliers Check for MOCA scores > 30
MDS-UPDRS Part III data checks [1] Check for Control subjects with two Part III scores at same visit
MDS-UPDRS Part III data checks [2] Check for subjects with ON score worse than OFF score at same visit
MDS-UPDRS Part III data checks [3] Check for subjects with 2 ON scores at same visit
MDS-UPDRS Part III data checks [4] Check for subjects with 2 OFF scores at same visit
MDS-UPDRS Part III data checks [5] Check for subjects with two different PD_MED_USE at same visit
PD Medication start date consistency Start date for initiation of PD medications should agree across datasets: CONMED, Reportable Events (Incidents)
PD Medication Use consistency PD Med Use should agree for subjects at each visit across datasets: PDMEDUSE, NUPDRS3, CONMED, Reportable Events (Incidents)
Lab/imaging checks (includes Datscan, MRI, CSF, COVANCE) [1] Check that clinical data visit labels match lab/imaging visit labels
Lab/imaging checks (includes Datscan, MRI, CSF, COVANCE) [2] Check for lab/imaging results at visits where the clinical data indicates the lab/image was not collected
CONMED data checks [1] Check for typos/unknown values in dose units and dose frequencies
CONMED data checks [2] Check for conmeds missing WHODRUG classification

Harmonized Assessment & Variable Matrix

The following variables are harmonized across a breadth of standard assessments from two or more AMP PD cohorts. Click a variable to view additional details such as its definition, values, schema, and curation notes. If you want to download a version of the full AMP PD Data Dictionary, click one of the buttons below for a specific format.

Assessments

Harmonized Variables

BioFIND

HBS

PDBP

PPMI

Enrollment

   

 

 

 

     
 

 

   
   

 

 
Demographics

 

     
 

 

   
 

 

   
       

Medical History

       
   

 

 

       

Environment Risk Factors

       
     

 

       
Clinical Assessments    

 

 

 

     
       
       
H&Y (see MDS-UPDRS Part III)  

 

   
UPDRS  

 

   
   

 

 
       
       
       
       
       
       
Biospecimen Analyses Sphingolipids (plasma & CSF)