This repository is under review for potential modification in compliance with Administration directives.
The AMP PD Knowledge Portal was developed to host and share resources related to Parkinson’s disease research and remains fully operational. We continue to maintain and accept Parkinson’s disease and related disorders data and resources throughout this review process.

Clinical Assessment Data

AMP PD harmonizes, or standardizes, similar data collected across BioFINDHBS, LBD, LCCPDBP, PPMISTEADY-PD3 and SURE-PD3. This data curation and transformation process facilitates and simplifies cross-cohort analysis. More specifically, variable names from AMP PD studies are aligned to a global mapping file and final curation is reviewed by AMP PD; this Harmonized Dictionary, based on CDISC terminology, is available as a reference for the harmonized clinical dataset and is linked with the Harmonized Assessment and Variable Matrix. Harmonized cohort data is made available in AMP PD through BigQuery.

AMP PD Quality control of the clinical data was performed by Alena Fedarovich Rancho BioSciences and Bary Landin and Dave Vismer from Technome as part of a contract with the Foundation for the National Institutes of Health (FNIH).

Rancho BioSciences

Technome

Harmonized Assessment & Variable Matrix

The following variables are harmonized across a breadth of standard assessments from two or more AMP PD cohorts. Click a variable to view additional details such as its definition, values, schema, and curation notes. If you want to download a version of the full AMP PD Data Dictionary, click one of the buttons below for a specific format.

To learn more, click on the Harmonized Variables.

AssessmentsHarmonized VariablesBioFINDHBSLBDLCCPDBPPPMISURE-PD3STEADY-PD3
Enrollment         
        
        
        
Demographics         
        
        
        
Medical History         
        
        
Environment Risk Factors         
        
        
Clinical Assessments         
        
        
        
        
        
        
        
        
        
        
        
        
Biospecimen Analyses         
        
        
        
        

Data harmonization infographic: AMP PD Cohorts: Data ingestion of disparate sources (BioFind, PPMI, PDBP, and HBS) - Independent clinical datasets. Rancho Biosciences: SmartConverter & Curation Pipeline: Data Curation, Harmonization, and Standardization; AMP PD Knowledge Portal: Google Cloud Platform

 

Data Curation Workflow

 

Data from four different Parkinson’s Disease studies were harmonized to the same standard, curated and consolidated into one dataset using automated and manual approaches. To harmonize and standardize metadata for AMP PD project a global mapping file (Harmonized Dictionary) aligning variables between datasets was first created. CDISC terminology was used for harmonized variable names and descriptions when possible. A coding file was then created to decode numeric coded variables, clean-up and standardize medication names, diagnosis, level of education, etc., and align visit names between cohorts. After mapping and coding files were generated, an automated tool was applied to transform data files and perform integration of four datasets into one set of curated files. Manual inspection of transformed files followed each phase of automatic transformation. The content of each transformed file was approved by a curator and all needed adjustments were performed manually. Finally mapping files (dictionaries) for uploading data into BigQuery tables were produced by processing the content of the curated dataset using additional R-script.

Curation workflow represents three main steps

1. Data acquisition and review
2. Data harmonization
3. Data transformation/curation and QC
 

 

Data Acquisition and Review

Based on the priority assigned by the AMP PD Clinical Data Harmonization (CDH) group, the data was split into two batches: Subset 1 & Subset 2. Considerations and approach for prioritization of clinical data to be harmonized:

  • Key variables critical for interpreting biological data (e.g. demographics)
  • Variables to increase ease of use of biological data (e.g. genotype)
  • Relevance and importance to Parkinson's disease
  • Data complementary to biologic data generated through AMP PD
  • Identified as the highest priority based on collective input from research experts in the PD community

 

 

Data Harmonization

Harmonization cycle icon


Metadata variables were harmonized based on the data compatibility upon Clinical Data Harmonization (CDH) group suggestions, decisions, and final approval. CDISC terminology was used if available for Title and Description. Values of harmonized variables from different studies were standardized and included in the coding file. The coding file contains decodes for numeric coded variables, clean-up and standardize medication names, diagnosis, level of education, etc., and aligns visit names between cohorts.
 

 

Data Transformation/Curation and QC

Both automated (custom SmartConverter tool) and manual approaches were used to perform data transformations. The original data files were inspected for extended ascii characters, number of patients, visit types, codes and their decodes availability in supporting study documents. Transformation templates and coding file were prepared based on a harmonized dictionary and curation decisions to perform three rounds of transformation/consolidation using SmartConverter. After each round output files were inspected, and additional manual transformations were performed before the next round of automated transformation and after the final curation. Subset 1 and subset 2 were curated separately using the same approach described below:

Step 1: Transform Raw Data

  1. Prepare vocabularies and add to primary code file
  2. Organize data-files by study
  3. Create coding file and transformation template
  4. Run SmartConverter Round 1 and perform QC

Step 2: Transform & Consolidate

  1. Organize curated files into distinct study folders
  2. Modify transformation template
  3. Consolidate subset 1 and subset 2 categories
  4. Run SmartConverter Round 2 and perform QC

Step 3: Transform & Finalize

  1. Add and consolidate clinical data (e.g. missing diagnosis inputs)
  2. Remove and substitute fields
  3. Run SmartConverter Round 3 and perform QC

Clinical Data Validation Plan

The AMP PD Clinical Data Harmonization (CDH) team crafted a plan to further validate the results of the harmonization process. The purpose of the validation plan was to: 

  1. Ensure no new errors were introduced into the clinical data as a result of the data harmonization process
  2. Facilitate identification of records that should be excluded from the public release
  3. Identify a set of tests that can be run to validate additional data submission from the current AMP PD cohorts as well as future data submissions from new cohorts

clinical data validation tests_primary and secondaryThe CDH team constructed: 42 individual cohort tests, identified 23 unique tests to run against harmonized data from all four cohorts, and identified 19 tests that were not valid against harmonized data because of excluded or modified data points, or changes to data structures.

The following key decisions and outputs were made as a result of executing the validation plan: 

  1. Alignment of SmartConverter data outputs against program and cohort specific tests

  2. Final inclusion/exclusion release criteria for clinical data

  3. Secondary dataset(s) for further analysis and curation for potential future release

  4. Confirmed AMP PD Subject Master List

  5. Final AMP PD clinical dataset for public release

   
 

Cohort & Across Cohort Business Rules

AMP PD received cohort specific business rules from BioFIND, HBS, PDBP, and PPMI. These rules were applied by the cohorts to the raw data inputs prior to the clinical data harmonization process and succeeding datasets were required to follow these business rules. As part of the QC process, these business rules were re-checked after the harmonization process to ensure the rules were still valid.

 

HBS Cohort Specific Data Checks

TestDescription
Discordant Sex CheckReported sex should be same across multiple visits and studies
REM Sleep behavior Disorder Questionnaire CheckCheck RBD checklist score does not exceed 13
UPDRS total score checkingCheck total score does not exceed 199
UPDRS subscale score checkingCheck UPDRS subscale scores do not exceed the following: Section I: 16 points; Section II: 52; Section III: 108; and Section IV: 23
MMSE outlier checkCheck MMSE score does not exceed 30
Change in diagnosisCheck consistency of diagnosis across multiple visits and studies
Medical history consistencyCheck consistency of medical history across multiple visits and studies (if lifetime condition reported "YES" in one visit, following visits should not be "NO")
Family history consistencyCheck consistency of family history across multiple visits and studies (if lifetime condition reported for family member "YES" in one visit, following visits should not be "NO")
PD risk factor consistencyCheck consistency of PD risk factors across multiple visits and studies (if lifetime risk reported "YES" in one visit, following visits should not be "NO")
Known pregnancyCheck that pregnancy marked "N/A" in males
HeightCheck consistency of reported height across multiple visits and studies
Age consistencyCheck consistency of age, adjusting for time, across multiple visits and studies
Ethnicity consistencyEthnicity should be same across multiple visits and studies
Race consistencyRace should be same across multiple visits and studies

LBD Cohort Specific Data Checks

TestDescription
Age consistencyCheck consistency of age, adjusting for time, across multiple visits and studies
Ethnicity consistencyEthnicity should be same across multiple visits and studies
Race consistencyRace should be same across multiple visits and studies
Sex consistency checkSex should be same for the same GUID across PDBP cohorts
Missing form checkThe required clinical assessment not filled or not submitted

PDBP Cohort Specific Data Checks also applied to STEADY-PD3 and SURE-PD3

TestDescription
Sex consistency checkSex should be same for the same GUID across PDBP cohorts
Ethnicity consistency checkEthnicity should be same for the same GUID across PDBP cohorts
Race consistency checkRace should be same for the same GUID across PDBP cohorts
Age consistency checkFor the same GUID and same Visit type, age should be the same (multi enrolled subjects are exceptions)
Visit date consistent checkingFor the same GUID and same Visit type, visit date should be the same (multi enrolled subjects are exceptions)
Neurological Examination self-conflict checkingInclusnXclusnCntrlInd' should be consistent with 'NeuroExamPrimaryDiagnos'
MoCA inconsistent with education level checkWhether subject got their 0 or 1 score according to the education level
MoCA outliers checkCheck MoCA score higher than 30
MDS-UPDRS Part III score scale checkFor case, part 3 score should be >10 ; for control, part 3 score should be <=10 (0-10)
MDS-UPDRS Part III score trend checkControl subject scores should decrease, while Case subject scores should increase
MoCA control checkCheck MoCA score lower than 20 if subject is a control
Missing form checkThe required clinical assessment not filled or not submitted
Retention rate checkCheck drop-outs per site

PPMI Cohort Specific Data Checks also applied to BioFIND and LCC

TestDescription
Enrollment pendingCheck for consented subjects who have not yet enrolled or screen failed after 2 months or more
Premature Withdrawal (PW) consistencyCheck for agreement in PW status across datasets: CONCL, Reportable Events (Incidents)
MOCA outliersCheck for MOCA scores > 30
MDS-UPDRS Part III data checks [1]Check for Control subjects with two Part III scores at same visit
MDS-UPDRS Part III data checks [2]Check for subjects with ON score worse than OFF score at same visit
MDS-UPDRS Part III data checks [3]Check for subjects with 2 ON scores at same visit
MDS-UPDRS Part III data checks [4]Check for subjects with 2 OFF scores at same visit
MDS-UPDRS Part III data checks [5]Check for subjects with two different PD_MED_USE at same visit
PD Medication start date consistencyStart date for initiation of PD medications should agree across datasets: CONMED, Reportable Events (Incidents)
PD Medication Use consistencyPD Med Use should agree for subjects at each visit across datasets: PDMEDUSE, NUPDRS3, CONMED, Reportable Events (Incidents)
Lab/imaging checks (includes Datscan, MRI, CSF, COVANCE) [1]Check that clinical data visit labels match lab/imaging visit labels
Lab/imaging checks (includes Datscan, MRI, CSF, COVANCE) [2]Check for lab/imaging results at visits where the clinical data indicates the lab/image was not collected
CONMED data checks [1]Check for typos/unknown values in dose units and dose frequencies
CONMED data checks [2]Check for conmeds missing WHODRUG classification