< Back To Home
/
Clinical Assessment Data

Clinical Assessment Data

AMP PD harmonizes, or standardizes, similar data collected across BioFIND, HBS, LBD, LCC, PDBP, PPMI, STEADY-PD3 and SURE-PD3. This data curation and transformation process facilitates and simplifies cross-cohort analysis. More specifically, variable names from AMP PD studies are aligned to a global mapping file and final curation is reviewed by AMP PD; this Harmonized Dictionary, based on CDISC terminology, is available as a reference for the harmonized clinical dataset and is linked with the Harmonized Assessment and Variable Matrix. Harmonized cohort data is made available in AMP PD through BigQuery.

AMP PD Quality control of the clinical data was performed by Alena Fedarovich Rancho BioSciences and Bary Landin and Dave Vismer from Technome as part of a contract with the Foundation for the National Institutes of Health (FNIH).

Rancho BioSciences

Technome

What's on this page:

Assessment & Variable Matrix

Data Curation Workflow

Data Acquisition & Review

Data Harmonization

Transformation, Curation & Validation

Harmonized Assessment & Variable Matrix

The following variables are harmonized across a breadth of standard assessments from two or more AMP PD cohorts. Click a variable to view additional details such as its definition, values, schema, and curation notes. If you want to download a version of the full AMP PD Data Dictionary, click one of the buttons below for a specific format.

Release 1 Data Dictionary (.xlsx)

Release 2 Data Dictionary (.xlsx)

To learn more, click on the Harmonized Variables.

Assessments	Harmonized Variables	BioFIND	HBS	LBD	LCC	PDBP	PPMI	SURE-PD3	STEADY-PD3
Enrollment	Participant_ID
	Study_Arm
	Screening (enrollment date, informed consent date)
	Genetic Information Variables
Demographics	Visit_Name
	Demographic Screening
	Sex, ethnicity, race
	Education Level
Medical History	Initiation and use of PD Medication
	Diagnosis and Change in Diagnosis
	PD Surgery (DBS)
Environment Risk Factors	Smoking
	Alcohol
	Caffeine History
Clinical Assessments	MDS-UPDRS Part I
	MDS-UPDRS Part II
	MDS-UPDRS Part III
	MDS-UPDRS Part IV
	H and Y (see MDS-UPDRS Part III)
	UPDRS
	MoCA
	UPSIT
	Epworth Sleepiness Scale (ESS)
	REM Sleep Behavior Disorder Questionnaire - Mayo
	REM Sleep Behavior Disorder Questionnaire - Stiasny_Kolster
	PDQ-39
	Modified Schwab and England ADL
Biospecimen Analyses	Sphingolipids (plasma and CSF)
	Glucerebrosidase (Plasma or CSF)
	A-beta
	Tau
	P-Tau

Data harmonization infographic: AMP PD Cohorts: Data ingestion of disparate sources (BioFind, PPMI, PDBP, and HBS) - Independent clinical datasets. Rancho Biosciences: SmartConverter & Curation Pipeline: Data Curation, Harmonization, and Standardization; AMP PD Knowledge Portal: Google Cloud Platform

Data Curation Workflow

Data from four different Parkinson’s Disease studies were harmonized to the same standard, curated and consolidated into one dataset using automated and manual approaches. To harmonize and standardize metadata for AMP PD project a global mapping file (Harmonized Dictionary) aligning variables between datasets was first created. CDISC terminology was used for harmonized variable names and descriptions when possible. A coding file was then created to decode numeric coded variables, clean-up and standardize medication names, diagnosis, level of education, etc., and align visit names between cohorts. After mapping and coding files were generated, an automated tool was applied to transform data files and perform integration of four datasets into one set of curated files. Manual inspection of transformed files followed each phase of automatic transformation. The content of each transformed file was approved by a curator and all needed adjustments were performed manually. Finally mapping files (dictionaries) for uploading data into BigQuery tables were produced by processing the content of the curated dataset using additional R-script.

Curation workflow represents three main steps

1. Data acquisition and review
2. Data harmonization
3. Data transformation/curation and QC

Data Acquisition and Review

Based on the priority assigned by the AMP PD Clinical Data Harmonization (CDH) group, the data was split into two batches: Subset 1 & Subset 2. Considerations and approach for prioritization of clinical data to be harmonized:

Key variables critical for interpreting biological data (e.g. demographics)
Variables to increase ease of use of biological data (e.g. genotype)
Relevance and importance to Parkinson's disease
Data complementary to biologic data generated through AMP PD
Identified as the highest priority based on collective input from research experts in the PD community

Data Harmonization

Harmonization cycle icon

Metadata variables were harmonized based on the data compatibility upon Clinical Data Harmonization (CDH) group suggestions, decisions, and final approval. CDISC terminology was used if available for Title and Description. Values of harmonized variables from different studies were standardized and included in the coding file. The coding file contains decodes for numeric coded variables, clean-up and standardize medication names, diagnosis, level of education, etc., and aligns visit names between cohorts.

Data Transformation/Curation and QC

Both automated (custom SmartConverter tool) and manual approaches were used to perform data transformations. The original data files were inspected for extended ascii characters, number of patients, visit types, codes and their decodes availability in supporting study documents. Transformation templates and coding file were prepared based on a harmonized dictionary and curation decisions to perform three rounds of transformation/consolidation using SmartConverter. After each round output files were inspected, and additional manual transformations were performed before the next round of automated transformation and after the final curation. Subset 1 and subset 2 were curated separately using the same approach described below:

Step 1: Transform Raw Data

Prepare vocabularies and add to primary code file
Organize data-files by study
Create coding file and transformation template
Run SmartConverter Round 1 and perform QC

Step 2: Transform & Consolidate

Organize curated files into distinct study folders
Modify transformation template
Consolidate subset 1 and subset 2 categories
Run SmartConverter Round 2 and perform QC

Step 3: Transform & Finalize

Add and consolidate clinical data (e.g. missing diagnosis inputs)
Remove and substitute fields
Run SmartConverter Round 3 and perform QC

Clinical Data Validation Plan

The AMP PD Clinical Data Harmonization (CDH) team crafted a plan to further validate the results of the harmonization process. The purpose of the validation plan was to:

Ensure no new errors were introduced into the clinical data as a result of the data harmonization process
Facilitate identification of records that should be excluded from the public release
Identify a set of tests that can be run to validate additional data submission from the current AMP PD cohorts as well as future data submissions from new cohorts

clinical data validation tests_primary and secondary The CDH team constructed: 42 individual cohort tests, identified 23 unique tests to run against harmonized data from all four cohorts, and identified 19 tests that were not valid against harmonized data because of excluded or modified data points, or changes to data structures.

The following key decisions and outputs were made as a result of executing the validation plan:

1. Alignment of SmartConverter data outputs against program and cohort specific tests

2. Final inclusion/exclusion release criteria for clinical data

3. Secondary dataset(s) for further analysis and curation for potential future release

4. Confirmed AMP PD Subject Master List

5. Final AMP PD clinical dataset for public release

Cohort & Across Cohort Business Rules

AMP PD received cohort specific business rules from BioFIND, HBS, PDBP, and PPMI. These rules were applied by the cohorts to the raw data inputs prior to the clinical data harmonization process and succeeding datasets were required to follow these business rules. As part of the QC process, these business rules were re-checked after the harmonization process to ensure the rules were still valid.

HBS Cohort Specific Data Checks

Test	Description
Discordant Sex Check	Reported sex should be same across multiple visits and studies
REM Sleep behavior Disorder Questionnaire Check	Check RBD checklist score does not exceed 13
UPDRS total score checking	Check total score does not exceed 199
UPDRS subscale score checking	Check UPDRS subscale scores do not exceed the following: Section I: 16 points; Section II: 52; Section III: 108; and Section IV: 23
MMSE outlier check	Check MMSE score does not exceed 30
Change in diagnosis	Check consistency of diagnosis across multiple visits and studies
Medical history consistency	Check consistency of medical history across multiple visits and studies (if lifetime condition reported "YES" in one visit, following visits should not be "NO")
Family history consistency	Check consistency of family history across multiple visits and studies (if lifetime condition reported for family member "YES" in one visit, following visits should not be "NO")
PD risk factor consistency	Check consistency of PD risk factors across multiple visits and studies (if lifetime risk reported "YES" in one visit, following visits should not be "NO")
Known pregnancy	Check that pregnancy marked "N/A" in males
Height	Check consistency of reported height across multiple visits and studies
Age consistency	Check consistency of age, adjusting for time, across multiple visits and studies
Ethnicity consistency	Ethnicity should be same across multiple visits and studies
Race consistency	Race should be same across multiple visits and studies

LBD Cohort Specific Data Checks

Test	Description
Age consistency	Check consistency of age, adjusting for time, across multiple visits and studies
Ethnicity consistency	Ethnicity should be same across multiple visits and studies
Race consistency	Race should be same across multiple visits and studies
Sex consistency check	Sex should be same for the same GUID across PDBP cohorts
Missing form check	The required clinical assessment not filled or not submitted

PDBP Cohort Specific Data Checks also applied to STEADY-PD3 and SURE-PD3

Test	Description
Sex consistency check	Sex should be same for the same GUID across PDBP cohorts
Ethnicity consistency check	Ethnicity should be same for the same GUID across PDBP cohorts
Race consistency check	Race should be same for the same GUID across PDBP cohorts
Age consistency check	For the same GUID and same Visit type, age should be the same (multi enrolled subjects are exceptions)
Visit date consistent checking	For the same GUID and same Visit type, visit date should be the same (multi enrolled subjects are exceptions)
Neurological Examination self-conflict checking	InclusnXclusnCntrlInd' should be consistent with 'NeuroExamPrimaryDiagnos'
MoCA inconsistent with education level check	Whether subject got their 0 or 1 score according to the education level
MoCA outliers check	Check MoCA score higher than 30
MDS-UPDRS Part III score scale check	For case, part 3 score should be >10 ; for control, part 3 score should be <=10 (0-10)
MDS-UPDRS Part III score trend check	Control subject scores should decrease, while Case subject scores should increase
MoCA control check	Check MoCA score lower than 20 if subject is a control
Missing form check	The required clinical assessment not filled or not submitted
Retention rate check	Check drop-outs per site

PPMI Cohort Specific Data Checks also applied to BioFIND and LCC

Test	Description
Enrollment pending	Check for consented subjects who have not yet enrolled or screen failed after 2 months or more
Premature Withdrawal (PW) consistency	Check for agreement in PW status across datasets: CONCL, Reportable Events (Incidents)
MOCA outliers	Check for MOCA scores > 30
MDS-UPDRS Part III data checks [1]	Check for Control subjects with two Part III scores at same visit
MDS-UPDRS Part III data checks [2]	Check for subjects with ON score worse than OFF score at same visit
MDS-UPDRS Part III data checks [3]	Check for subjects with 2 ON scores at same visit
MDS-UPDRS Part III data checks [4]	Check for subjects with 2 OFF scores at same visit
MDS-UPDRS Part III data checks [5]	Check for subjects with two different PD_MED_USE at same visit
PD Medication start date consistency	Start date for initiation of PD medications should agree across datasets: CONMED, Reportable Events (Incidents)
PD Medication Use consistency	PD Med Use should agree for subjects at each visit across datasets: PDMEDUSE, NUPDRS3, CONMED, Reportable Events (Incidents)
Lab/imaging checks (includes Datscan, MRI, CSF, COVANCE) [1]	Check that clinical data visit labels match lab/imaging visit labels
Lab/imaging checks (includes Datscan, MRI, CSF, COVANCE) [2]	Check for lab/imaging results at visits where the clinical data indicates the lab/image was not collected
CONMED data checks [1]	Check for typos/unknown values in dose units and dose frequencies
CONMED data checks [2]	Check for conmeds missing WHODRUG classification