Home
/
< Back To News & Updates
/
AMP PD Release 4.0 Release Notes – November 2023

News & Updates

AMP PD Release 4.0 Release Notes – November 2023

November 17, 2023

AMP PD is pleased to announce to the community that the Version 4 data release is now available through the AMP PD Knowledge Portal and AMP PD in Terra! All users who have a currently valid DUA agreement will be able to access the new data in the portal.

New in this release:

Postmortem Brain Cohort from 100 Single Nucleus participants with clinical data, WGS, and RNASeq data with samples from 5 brain regions, with single-sample and aggregate sequence, bam, vcf, and h5ad data
Untargeted Proteomics Data from 621 participants from PDBP and PPMI cohorts, with 4278 samples from plasma and CSF sources at the fragment, protein, and peptide levels, along with normalized batch corrected matrix data
Targeted Proteomics Data has been normalized from two datasets comprising 413 PDBP and PPMI participants in 5 targeted CSF and Plasma assays to enable analysis across all previously released targeted proteomics data

Data Composition

AMP PD's release 4.0 public dataset includes 10,908 participants in a Unified Cohort comprising nine cohorts that include BioFIND, HBS, LBD, LCC, PDBP, PPMI, Steady PD, Sure PD and the new Postmortem Cohort.

AMP PD includes clinical data for all participants, as well as WGS data for 10,418 joint genotyped samples, transcriptomics data for 3,364 participants that includes 8461 whole blood bulk RNA samples, proteomics data from 695 participants in targeted and untargeted proteomics datasets, and single nucleus data from 100 participants with matching clinical, WGS, and RNA samples from 5 brain regions.

Within this release, 3007 participants have matching clinical records, WGS joint genotyped samples, and RNA samples. Of these participants, 672 participants also have overlapping proteomics data within targeted and untargeted datasets.
November 2023 GP2 Data Update

Release 4.0 Data

Proteomics
Proteomics within Release 4.0 includes a bridged targeted proteomics dataset that was generated using the Olink platform from CSF and Plasma targeted assays in 5 panels, and untargeted mass spectrometry based proteomics data from plasma and CSF sources at the fragment, protein, and peptide levels, along with normalized batch corrected matrix data.

Targeted Proteomics: The bridged targeted proteomics datasets comprise CSF and plasma based assays in 5 panels: cardiometabolic, inflammation, neurology, and oncology. These are released in file based formats and as queryable tables in BigQuery. NPX files for each of CSF (CSF) and Plasma (PLA) tissue sources are available in per-panel files and aggregated matrix files. Per-panel files have also been released in Explore format for use with OLink analysis tools.

Four types of data tables for each of CSF (CSF) and Plasma (PLA) tissue sources are available in BigQuery and can be accessed directly through Google interfaces or through notebooks and tools in Terra. AMP PD provides examples for how to access, retrieve, and use the targeted proteomics data through workspaces and notebooks in Terra. Registered users can access the workspaces below, which include Jupyter notebooks written in R and in python that can be cloned and edited to suit your needs

AMP PD - Proteomics Getting Started
The purpose of this workspace is to provide information to help you get started. This workspace includes directory listings and notebooks to help access, retrieve, analyze, and download proteomics data

AMP PD - Proteomics QC and Analysis : Release 4.0
The purpose of this workspace is to provide QC and analysis notebooks for AMP PD Tier 2 users.

Targeted Proteomics PCA: graphing principal component analysis
NPX Boxplots: subset proteomics data to generate boxplots
APOE Proteomics Case Study: protein expression and APOE genotype correlation, case/control deltas
QCPlots: simple quality control checks and QC plots to show a visual representation of the QC data

Untargeted Proteomics: The untargeted proteomics datasets comprise CSF and plasma based assays that use mass spectrometry via the Orbitrap Exploris platform and Openswath target extraction software to produce raw, processed, and aggregate expression data at the fragment, protein, and peptide levels. Single sample data is available for the CSF (CSF) assay in raw and mzML formats, along with normalized batch corrected matrix data in csv formats. The plasma (PLA) assay includes native and depleted single sample and aggregate data in the same file formats. And each of CSF and PLA datasets is accompanied by a complete SDRF metadata file that includes relevant clinical data fields and instrument details.

AMP PD strives to provide clear provenance of the data. As such, this release contains files at each stage of processing to allow researchers to see how the data evolved and to allow researchers to process samples in a divergent manner. Intermediate data files in this release include matrices that are prepared before batch correction was applied, before imputation was applied, and before normalization was applied at the fragment level.

A data table for each of CSF (CSF) and Plasma (PLA) assays identifies protein abundance per detected UniProt id for each participant sample. These are available in BigQuery and can be accessed directly through Google interfaces or through notebooks and tools in Terra. AMP PD provides examples for how to access, retrieve, and use the untargeted proteomics data through workspaces and notebooks in Terra. Registered users can access the workspaces below, which include Jupyter notebooks written in R and in python that can be cloned and edited to suit your needs.

AMP PD - Proteomics Getting Started Tier-2
The purpose of this workspace is to provide information to help you get started. This workspace includes directory listings and notebooks to help access, retrieve, analyze, and download proteomics data

AMP PD - Proteomics QC and Analysis
The purpose of this workspace is to provide QC and analysis notebooks for AMP PD Tier 2 users.

Untargeted Proteomics PCA: graphing principal component analysis
NPX Boxplots: subset proteomics data to generate boxplots
APOE Proteomics Case Study: protein expression and APOE genotype correlation, case/control deltas

Single Nucleus Brain Sample Data
With the introduction of Single Nucleus data, a new Postmortem cohort has been added to the AMP PD Unified Cohort. This data includes clinical, genomic, and transcriptomic data from single nucleus brain samples. Brain tissue samples from PD case and control participants were prepared for 5 brain regions: the primary visual cortex, primary motor cortex, prefrontal cortex, dorsal motor nucleus of the Xth nerve, and globus pallidus interna. Sample data available in this release includes single sample WGS data in cram and vcf formats, and single sample and aggregate RNA sequence data in h5ad.

BigQuery datasets for Single Nucleus WGS and RNA sequence data include AMP PD standard sample inventory data and metadata, which can be queried to learn file locations and participant data, and a QC flags data table. AMP PD provides examples for how to access and retrieve clinical, WGS, and RNA sequence data in the Single Nucleus cohort through workspaces and notebooks in Terra. Registered users can access the workspace below, which includes Jupyter notebooks written in R and in python that can be cloned and edited to suit your needs.

AMP PD - Getting Started Tier-2 - Clinical and Omics Access
The purpose of this workspace is to provide information to help you get started. This workspace includes directory listings and notebooks to help access, retrieve, analyze, and download proteomics data

Clinical - load a table from BigQuery: interacting with and displaying clinical data in a Jupyter notebook
Single Nucleus RNAseq - load H5AD from cloud storage: identify case and control participants and retrieve associated SN data

V4.0 Release Updates

Additions

New cohort: Postmortem Brain Cohort with clinical WGS and RNA Seq data
New data type: Untargeted Proteomics from mass spectrometry
New dataset: Targeted Bridged dataset consolidating 2 previously released targeted datasets for cross analysis
Recovered WGS: WGS sample that aligns with new proteomics data was recovered by meeting AMP PD clinical data requirements
Terra Workspace Updates: Getting Started with Proteomics Notebook examples to use the Bridged Targeted Proteomics dataset
Terra Workspace Notebooks: Getting Started Notebooks for using and analyzing Untargeted Proteomics

Changes

Corrected BAM index files for EV exRNA pilot samples and recovered two BAM files