Genomic Data

Genomic data in AMP PD are contributed and curated by the Global Parkinson’s Genetics Program (GP2). To increase ancestral diversity in PD genetic datasets, GP2 is working with researchers around the world and aims to genotype over 150,000 individuals with Parkinson’s disease and controls; visit the GP2 dashboard to view the progress of GP2 so far and the cohorts participating in this effort to this date.

Genotyping was performed using the Illumina NeuroBooster array that includes a Global Diversity Array-8 (GDA) backbone. The array was developed to test 1.9 million markers and includes ~95,000 Neurodegenerative disease-specific variants. Samples coming from GP2 are being genotyped at various international centers using standard Illumina infrastructure. For more information on the content that is included in the NeuroBooster array, please see the Neurobooster Github.

GP2 Ancestry	Total	PD	Non-PD
African Admixed	162	81	81
Ashkenazi Jewish	520	396	124
Latino and Indigenous people of the Americas	214	152	62
East Asian	46	35	11
European	3919	2732	1187
South Asian	47	38	9
Total	4908	3434	1474

Genotype Data Processing

[embed type:node embed_type:image id:257 align:right width:350 height:]

Genotype data was produced and clustered using general Illumina standard genotyping protocols and quality control of data was performed by GP2.

All data processing was performed against ancestry appropriate reference panels as well as TopMed or equivalent.

Genotype Quality Control Process

The genotype calling and quality control practices can be found on GitHub [https://github.com/GP2code/GenoTools] and are constantly being improved and updated. In general the QC involves filtering for basic sample level metrics like call rate, extreme heterozygosity and homozygosity outliers, as well as sex checks that compare clinical and genetically ascertained sex. Ancestry will be genomically adjudicated and data will be subsetted accordingly i.e. all passing QC samples are split by ancestry and imputed within ancestry. Related samples are retained for imputation and removed before further analysis to maximize case:control balance, keeping probands. Various levels of standard pre- and post- imputation variant filtering will be carried out as is standard including retaining variants that have an imputation quality score (RSQ) >= 0.3 and minor allele count >= 10.

Specific ancestry populations include: European, Ashkenazi Jewish, Finnish, African, African-American/Caribbean, Native American, South Asian, East Asian.

Genotype data products provided by GP2

PLINK2 binary containing filtered imputed SNPs
HDF5 containing metadata with counts of samples/variants removed at each QC step, Ancestry makeup of cohort, PC/UMAP plots

Whole Genome Data

GP2 includes whole genome sequencing data for some study participants. For GP2, was performed using an Illumina NovaSeq or newer/equivalent using the Illumina HiSeq XTen sequencer with samples coming from whole blood, saliva or brain tissue.
Quality control of sequenced data is to be performed by GP2.

WGS Sample Processing

For GP2 WGS Workflow was performed using Cromwell, the execution engine from the Broad institute using workflows written in the workflow definition language (WDL) and published by Broad.

WGS Integrated Quality Control Process

Quality control included:
Sample Quality

Contamination (Freemix < 3%)
Coverage (Mean coverage < 25)
WGS metric outliers (TiTv < 2)
Missingness (missingness genotype rates per sample > 5%)

Genetic Data Checks

Duplication check
Clinically reported sex
Excessive heterogeneity / homogeneity

Data Availability
Following Processing and QC the following data are made available by GP2.
Data products for WGS data provided by GP2

PLINK
CRAM
VCF
Metric files
CSV files