News & Updates
GP2 9th Data Release Notes - December 2024
The Components of GP2’s 9th Data Release
Tags
Research Operations; Research Collaboration; Complex Disease Genetics; Release
Authors
Hampton Leonard
DataTecnica/National Institutes of Health | USA
Hampton has a background in data science and machine learning, which she applies to large multi-omic datasets in the neurodegenerative disease space. She is passionate about investigating differences on both clinical and omic levels and how these differences can affect clinical trial outcomes.
Mike Nalls
DataTecnica/National Institutes of Health | USA
Mike founded Data Tecnica in early 2017 after over a decade of experience in large dataset analytics and methods research in healthcare and other scientific fields. Mike has 400+ peer-reviewed publications in the field of applied statistics in large datasets, brain diseases, and genomics. He is a strong advocate of open science, collaboration, and transparency in science.
Dan Vitale
DataTecnica/National Institutes of Health | USA
Dan is a data science consultant for Data Tecnica, consulting primarily for the Laboratory of Neurogenetics and CARD at the National Institute on Aging of the National Institutes of Health. His work is focused on open science, automation, development of genetic analytic pipelines and software, and machine learning.
Mathew Koretsky
DataTecnica/National Institutes of Health | USA
Mat is a data science consultant for Data Tecnica, consulting primarily for CARD at the National Institute on Aging of the National Institutes of Health. He is passionate about pipeline development and meaningful applications of computer science in the biomedical research space.
Kristin Levine
Data Tecnica/National Institutes of Health | USA
Kristin works with the Data Tecnica and National Institute on Aging (NIA) teams on data and code sharing plus real-world data analysis of biobanks and healthcare systems. She is also an accomplished writer, now applying her communication skills to scientific domains.
Mary B Makarious
Data Tecnica/National Institutes of Health | USA
Mary is a biomedical data scientist committed to open science principles and enhancing diversity in genomic studies. With her background in machine learning, data science, and genetics, she analyzes large-scale multi-omics datasets to develop open, reproducible pipelines and user-friendly notebooks and tools. Her efforts aim to empower others to effectively explore and interpret their own data and to foster a more inclusive and collaborative scientific community.
Lietsel Jones
DataTecnica/National Institutes of Health | USA
Lietsel is an analyst with Data Tecnica with a keen interest in the intersection between epidemiology and genetics. She is also a clinical data manager with GP2 working to collect and harmonize large clinical datasets from worldwide contributors.
Zih-Hua Fang
German Center for Neurodegenerative Diseases | Germany
The lead of the monogenic data analysis efforts in GP2, they are making significant contributions to GP2’s efforts to study monogenic and familial Parkinson’s disease.
J Solle
Michael J. Fox Foundation for Parkinson’s Research | USA
J is the implementation Program Lead for GP2, co-lead for the Operations & Compliance Working Group, and a member of the Operations Committee.
On behalf of the GP2 Operations & Compliance, Complex Disease Data Analysis, Monogenic Data Analysis, Clinical Integration, and Data and Code Dissemination Working Groups.
Overview
In December 2024, GP2 announced the 9th data release on the Terra and the Verily® Workbench platforms in collaboration with AMP® PD. This release includes 17,690 additional genotyped participants.
- The genotype array data, including locally-restricted samples, now consists of a total of 71,835 genotyped participants (31,985 PD cases, 18,249 Controls, and 21,601 ‘Other’ phenotypes).
- When removing the locally-restricted samples, these now consist of 55,305 samples (23,709 PD cases, 13,404 Controls, and 18,192 ‘Other’ phenotypes)
- Of those 71,835 samples with genotyped data:
- 16,800 individuals also have deep clinical phenotyping information (Release 8)
- 10,454 total individuals also have clinical exomes information (Release 8)
- 7,732 total individuals also have WGS data (Release 8)
What’s New In This Release?
- Regarding additional data:
- We have added 17,690 genotyped participants to Release 9.
- For researchers who would prefer their raw data to be flipped and aligned, we are providing raw_genotypes_flipped in addition to the raw_genotypes.
- Regarding sample identifiers:
- The ‘m-’ prefix used to denote cohorts originally recruited through the monogenic network has been deprecated.
- The ‘_s*’ suffix for GP2 sample naming is being deprecated. Sample number is still available via the master key to enable matching with previous release IDs, but GP2 sample IDs in the genetic files will no longer include the ‘_s*’ suffix.
- PPMI GP2IDs in all files have been updated to include their PATNO ID to make it easier for researchers working across platforms.
Locality-restricted GDPR samples via the Verily Viewpoint Workbench
We are continuing to pilot granting access to locally-restricted samples, otherwise known as samples governed by the General Data Protection Regulation (GDPR) policy, through our collaboration with the Verily Viewpoint Workbench.
At this time, as GP2 continues to roll out data sharing solutions for GDPR protected data, release 9 data with regional restrictions will be available to only GP2 consortium members and partners. As testing and implementation continues in 2024, this solution will be available to the broader research community. All release 9 samples can be found on Workbench, meanwhile all release 9 samples not governed by GDPR requirements can be found on the community workbench on Terra (like all previous releases). To gain access to the full release on VWB you must:
- Have approved GP2 Tier 2 access
- Fill out the GDPR-governed sample request form
- Be a GP2 consortium member (contributing cohort, GP2 partner, or project analyses team member)
Clinical Data
This release contains clinical data for a total of 71,835 individuals who have genetic and core clinical data available. There is deep clinical phenotyping data and genetic data for 16,800 individuals in this release. This information consists of:
- Age at diagnosis and onset
- Primary, current, and latest diagnoses
- Cognitive exams such as the Mini-Mental State Examination (MMSE) and the Montreal Cognitive Assessment (MoCA)
- Movement Disorder Society-Sponsored Revision of the Unified Parkinson's Disease Rating Scale (MDS-UPDRS)
- Detailed “other” phenotypes, such as Lewy body Dementia (LBD)
- Cases recruited via the Monogenic network are coded as ‘Other’
Individual-Level Data
We now capture the data from a total of 104 cohorts. Please refer to the GP2 Cohort Dashboard for more information on the cohorts that have been shared.
Genetically-determined ancestry of array genotyped GP2 participants are broken into 11 ancestry groups; the table below details the genetically-determined ancestry of genotyped participants in this release that have passed quality control and been imputed. These numbers reflect samples from previous releases, reclustered using the updated cluster file and subjected to quality control, as well as newly genotyped samples exclusive to this release.
Array Genotyped Data - GP2 Release 9 | ||||
Ancestry | Total (+VWB) | PD (+VWB) | Control (+VWB) | Other (+VWB) |
African | 2,747 (2,767) | 998 (992) | 1,728 (1,730) | 36 (45) |
African Admixed | 1,216 (1,230 | 336 (341) | 835 (837) | 45 (52) |
Ashkenazi Jewish | 3,100 (3,251) | 1,451 (1,497) | 405 (431) | 1,244 (1,323) |
Latino and Indigenous people of the Americas | 3,540 (3,581) | 1,945 (1,972) | 1,450 (1,457) | 145 (152) |
East Asian | 5,865 (5,902) | 1,864 (1,882) | 2,436 (2,445) | 1,565 (1,575) |
European | 35,296 (51,247) | 15,196 (23,198) | 5,475 (10,230) | 14,625 (17,819) |
South Asian | 623 (731) | 193 (235) | 208 (217) | 222 (279) |
Central Asian | 1,071 (1,089) | 609 (617) | 345 (347) | 117(125) |
Middle Eastern | 732 (866) | 398 (477) | 200 (224) | 134 (165) |
Finnish | 112 (137) | 88 (106) | 5 (9) | 19 (22) |
Complex Admixture | 1,003 (1,034) | 646 (668) | 317 (322) | 40 (44) |
Total | 55,305 (71,835) | 23,709 (31,985) | 13,404 (18,249) | 18,192 (21,601) |