This repository is under review for potential modification in compliance with Administration directives.
The AMP PD Knowledge Portal was developed to host and share resources related to Parkinson’s disease research and remains fully operational. We continue to maintain and accept Parkinson’s disease and related disorders data and resources throughout this review process.

News & Updates

GP2 10th Data Release Notes – June 2025

The Components of GP2’s 10th Data Release

Tags

Research Operations; Research Collaboration; Complex Disease Genetics; Release
 

Authors

Hampton Leonard
DataTecnica/National Institutes of Health | USA

Hampton has a background in data science and machine learning, which she applies to large multi-omic datasets in the neurodegenerative disease space. She is passionate about investigating differences on both clinical and omic levels and how these differences can affect clinical trial outcomes.

Mike Nalls
DataTecnica/National Institutes of Health | USA

Mike founded Data Tecnica in early 2017 after over a decade of experience in large dataset analytics and methods research in healthcare and other scientific fields. Mike has 400+ peer-reviewed publications in the field of applied statistics in large datasets, brain diseases, and genomics. He is a strong advocate of open science, collaboration, and transparency in science.

Dan Vitale
DataTecnica/National Institutes of Health | USA

Dan is a data science consultant for Data Tecnica, consulting primarily for the Laboratory of Neurogenetics and CARD at the National Institute on Aging of the National Institutes of Health. His work is focused on open science, automation, development of genetic analytic pipelines and software, and machine learning.

Mathew Koretsky
DataTecnica/National Institutes of Health | USA

Mat is a data science consultant for Data Tecnica, consulting primarily for CARD at the National Institute on Aging of the National Institutes of Health. He is passionate about pipeline development and meaningful applications of computer science in the biomedical research space.

Kristin Levine
DataTecnica/National Institutes of Health | USA

Kristin works with the Data Tecnica and National Institute on Aging (NIA) teams on data and code sharing plus real-world data analysis of biobanks and healthcare systems. She is also an accomplished writer, now applying her communication skills to scientific domains.

Mary B Makarious
DataTecnica/National Institutes of Health | USA

Mary is a biomedical data scientist committed to open science principles and enhancing diversity in genomic studies. With her background in machine learning, data science, and genetics, she analyzes large-scale multi-omics datasets to develop open, reproducible pipelines and user-friendly notebooks and tools. Her efforts aim to empower others to effectively explore and interpret their own data and to foster a more inclusive and collaborative scientific community.

Lietsel Jones
DataTecnica/National Institutes of Health | USA

Lietsel is an analyst with Data Tecnica with a keen interest in the intersection between epidemiology and genetics. She is also a clinical data manager with GP2 working to collect and harmonize large clinical datasets from worldwide contributors.

Zih-Hua Fang
German Center for Neurodegenerative Diseases | Germany

Zih-Hua leads the whole-genome sequencing data analysis efforts in GP2 and contributes to GP2’s work on monogenic and familial Parkinson’s disease.

J Solle
Michael J. Fox Foundation for Parkinson’s Research | USA

J is the implementation Program Lead for GP2, co-lead for the Operations & Compliance Working Group, and a member of the Operations Committee.

On behalf of the GP2 Operations & Compliance, Complex Disease Data Analysis, Monogenic Data Analysis, Clinical Integration, and Data and Code Dissemination Working Groups.


Overview

In July 2025, GP2 announced the 10th data release on the Terra and the Verily® Workbench platforms in collaboration with AMP® PD. This release includes 11,109 additional genotyped participants and 13,339 additional WGS participants.

  • The genotype array (NBA) data, including locally-restricted samples, now consists of a total of 82,944 genotyped participants (36,939 PD cases, 19,821 Controls, and 26,184 ‘Other’ phenotypes).
    • When removing the locally-restricted samples, these now consist of 65,303 samples (28,586 PD cases, 15,258 Controls, and 21,459 ‘Other’ phenotypes).
  • The whole genome sequencing (WGS) data now consists of a total of 21,073 sequenced participants (8,134 PD cases, 3,531 Controls, and 9,408 ‘Other’ phenotypes).
    • When removing the locally-restricted samples, these now consist of 16,608 participants (6,801 PD cases, 3,244 Controls, and 6,563 ‘Other’ phenotypes).
    • Of note, cases recruited via the Monogenic network are coded as ‘Other’.
  • The clinical exome data now consists of 10,454 samples with PD (Release 8).
  • Of the 92,021 unique samples with genetic data (NBA, WGS, or clinical exome), 26,982
    individuals also have additional extended clinical information. 

What’s New In This Release?

Expanding Genomic Data
This release introduces a substantial expansion in the number of participants with available genetic data. We have added:

  • 11,109 new participants with genotype array (NBA) data
  • 13,339 new participants with whole genome sequencing (WGS) data
  • 12,311 new participants with extended clinical data
  • A family file (and corresponding data dictionary) which reports pairwise kinship estimates
    between individuals within families. It includes both inferred relationships (with kinship coefficients) and reported relationships.


Inclusion of PAR Region in Imputation
We’ve reintroduced the pseudoautosomal (PAR) region in the imputation of genotype array data, improving coverage and interpretation of sex chromosome variation. This enhancement is part of ongoing efforts to enhance genomic coverage and analytic accuracy.

Joint-calling Now Include AMP® PD cohorts

  • The jointly-called WGS variant sets now include samples from the following five AMP® PD cohorts: BioFind, PPMI, LCC, STEADY-PD3 and SURE-PD3.
    • By processing these samples together with GP2 rather than independently, it minimizes missingness, artifacts, and improves genotype accuracy.
  • We have added a column to master key denoting which GP2 samples are also present in the AMP-PD dataset.

Targeted Imputation of rs3115534 Across Select Ancestries
In response to strong community interest in the intronic variant rs3115534, given that it’s been associated with increased risk of Parkinson’s disease and REM sleep behavior disorder, and has been functionally validated, we have now implemented a targeted imputation strategy to ensure its inclusion in the released datasets

  • Specifically, chromosome 1 was imputed for five ancestries (AFR, AAC, AMR, MDE, and CAH) using the 1000 Genomes Phase 3 30x high coverage reference panel.
  • Following imputation, data for rs3115534 was merged back into the TOPMed-based imputed files provided with GP2 releases. Note that imputation metrics for this variant did not meet quality thresholds (R2 < 0.3) in other ancestry groups.
rs3115534 Release 10 Imputation Metrics using Phase 3 30x 1000 Genomes Panel
PopulationStatusAFMAFAVG_CSR2
AFRIMPUTED0.7610490.2389510.9928790.968831
AACIMPUTED0.8555860.1444140.9934580.959606
AMRIMPUTED0.9834140.01658550.9914530.507857
MDEIMPUTED0.9804070.01959260.9909120.584081
CAHIMPUTED0.937930.06207030.9939820.909959

New Summary Statistics Now Available
We’ve made available several new GWAS summary statistics datasets, expanding global representation:

  • GP2’s European (EUR) meta-GWAS (pre-print; GitHub)
  • South African GWAS (pre-print pending; GitHub)
  • Indian GWAS (pre-print; GitHub)
  • RBD (REM Sleep Behavior Disorder) GWAS (pre-print pending; GitHub pending)
  • LARGE-PD GWAS, which includes Latino American participants (pre-print pending;
    GitHub pending)

Clinical Data
This release contains clinical data for a total of 92,021 individuals who have genetic and core clinical data available. Of these, 26,982 have deep clinical phenotyping data available. This information consists of:

  • Age at diagnosis and onset
  • Primary, current, and latest diagnoses
  • Cognitive exams such as the Mini-Mental State Examination (MMSE) and the Montreal Cognitive Assessment (MoCA)
  • Movement Disorder Society-Sponsored Revision of the Unified Parkinson's Disease Rating Scale (MDS-UPDRS)
  • Detailed “other” phenotypes, such as Lewy body Dementia (LBD)

Individual-Level Data
We now capture the data from a total of 124 cohorts. Please refer to the GP2 Cohort Dashboard for more information on the cohorts that have been shared.

Genetically-determined ancestry of array genotyped GP2 participants are broken into 11 ancestry groups; the tables below provide details of the genetically-determined ancestry of participants in this release that have passed quality control for array data and whole genome sequencing data. These numbers reflect samples from previous releases, reclustered using the updated cluster file and subjected to quality control, as well as newly genotyped samples exclusive to this release. The final table provides information about the genetically-determined ancestry of selected other, non-PD phenotypes.

Array Genotyped Data - GP2 Release 10
AncestryTotal (+VWB)PD (+VWB)Control (+VWB)Other (+VWB)
African3,754 (3,780)1,181 (1191)2,305 (2,307)268 (282)
African Admixed1,192 (1,215)361 (370)760 (763)71 (82)
Ashkenazi Jewish3,265 (3,472)1,482 (1,531)408 (435)1,375 (1,506)
Latino and Indigenous people of the Americas3,564 (3,608)1,974 (1,995)1,433 (1,439)157 (174)
East Asian6,619 (6,662)2,393 (2,411)2,697 (2,705)1,529 (1,546)
European41,901 (58,823)18,703 (26,778)5,899 (10,372)17,299 (21,673)
South Asian801 (945)270 (317)260 (269)271 (359)
Central Asian1670 (1691)776 (782)624 (626)270 (283)
Middle Eastern1349 (1,493)675 (752)535 (559)139 (182)
Finnish116 (144)87 (106)8 (12)21 (26)
Complex Admixture1,072 (1,111)684 (706)329 (334)59 (71)
Total65,303 (82,944)28,586 (36,939)15,258 (19,821)21,459 (26,184)
Whole Genome Sequenced Data - GP2 Release 10
AncestryTotal (+VWB)PD (+VWB)Control (+VWB)Other (+VWB)
African1,671 (1,696)646 (656)848 (853)177 (187)
African Admixed254 (267)126 (130)113 (114)15 (23)
Ashkenazi Jewish1,389 (1,485)337 (355)100 (106)952 (1,024)
Latino and Indigenous people of the Americas301 (333)154 (171)24 (24)123 (138)
East Asian2,525 (2,542)576 (582)343 (343)1,606 (1,617)
European8,354 (12,461)4,155 (5,389)1,131 (1,397)3,068 (5,675)
South Asian309 (417)47 (73)10 (16)252 (328)
Central Asian833 (840)259 (261)329 (330)245 (249)
Middle Eastern788 (824)386 (394)308 (309)94 (121)
Finnish22 (30)17 (20)4 (4)1(6)
Complex Admixture162 (178)98 (103)34 (35)30 (40)
Total16,608 (21,073)6,801 (8,134)3,244 (3,531)6,563 (9,408)
Array Genotyped Data - GP2 Release 10
AncestryProdromal NBA/WGSPSP NBA/ WGSAD NBA/WGSDLB NBA/ WGSMSA NBA/ WGSCBD/CBS NBA/WGSFTD NBA/WGS
African16/76/40/02/07/41/00/0
African Admixed23/74/21/00/02/01/00/0
Ashkenazi Jewish308/7123/129/014/68/34/32/1
Latino and Indigenous people of the Americas30/115/05/02/02/01/00/0
East Asian27/414/634/418/06/1782/320/0
European4206/8481307/ 920484/136442/340421/ 334166/15965/63
South Asian3/234/321/05/15/89/92/2
Central Asian4/44/170/724/11/04/10/0
Middle Eastern14/19/42/21/00/01/11/1
Finnish9/02/12/00/01/10/01/0
Complex Admixture9/27/55/43/11/00/01/1
Total4649/9571415/ 1044583/218491/349454/ 528189/20572/68
Snapshot of Clinical Data - GP2 Release 10 (on VWB)
Clinical DataN, Unique IDsN, IDs with Follow-up
Age at Sample Collection71,747-
Age at Onset38,718-
Age at Diagnosis31,667-
Basic Family History92,021-
Demographics26,701-
Hoehn & Yahr Stage11,4865,515
UPDRS Part 1 Score2,3591,057
UPDRS Part 2 Score2,3381,049
UPDRS Part 3 Score3,6061,084
UPDRS Part 4 Score1,7391,090
MDS UPDRS Part 1 Score5,1682,802
MDS UPDRS Part 2 Score5,2422,854
MDS UPDRS Part 3 Score7,5322,870
MDS UPDRS Part 4 Score2,4791,016
MOCA9,5002,753
MMSE1,954-
RBD Score3,9863,290
Head Trauma5,4953,747
Vitals5,8954,035
Smell5,2001,466

Data Access

Locality-restricted GDPR samples via the Verily Viewpoint Workbench

We are continuing to pilot granting access to locally-restricted samples, otherwise known as samples governed by the General Data Protection Regulation (GDPR) policy, through our collaboration with the Verily Viewpoint Workbench.

At this time, as GP2 continues to roll out data sharing solutions for GDPR protected data, release 10 data with regional restrictions will be available to only GP2 consortium members and partners. As testing and implementation continues in 2025, this solution will be available to the broader research community. All release 10 samples can be found on Workbench, meanwhile all release 10 samples not governed by GDPR requirements can be found on the community workbench on Terra (like all previous releases). To gain access to the full release on VWB you must:

  1. Have approved GP2 Tier 2 access
  2. Fill out the GDPR-governed sample request form
  3. Be a GP2 consortium member (contributing cohort, GP2 partner, or project analyses team member)

Future data releases will continue to grow the diversity of participants available. You can check out our dashboard to see our progress. For users with tier 2 access already, you can explore the data further on our cohort browser, expanded on in a previous blog post.

As always, please refer to the README that accompanies each GP2 release for further details regarding recommendations for quality control, pipelines, data, and analyses!