Postmortem Sequencing Data
Postmortem Sequencing Data
The release 4 post-mortem sequencing cohort includes 100 whole-genome sequences (WGS) derived from primary visual cortex and 444 single-nucleus transcriptomes (snRNA-seq) from a total of ~2.4 million brain cells derived from 5 brain regions. The sequenced brain tissue came from 100 individuals in various stages of PD progression as quantified by the Braak scoring system as well as unaffected controls (UC). The five brain regions profiled in single nucleus RNA sequencing were selected to represent early through late stages of Parkinson’s disease; see below: 1) dorsal motor nucleus of the Xth nerve (DMNX); 2) globus pallidus interna (GPI); primary motor cortex (PMC); dorsolateral prefrontal cortex (DLPFC); and 5) the typically unaffected primary visual cortex (PVC). (Fig. 1)
Sample data available in release 4 includes primary visual cortex sample WGS data in cram and vcf formats, and snRNA-seq samples from 5 brain regions in separate h5ad files as well as in aggregated h5ad file (possibly add: containing transcriptomics signal from all samples). BigQuery datasets for snRNA-seq and WGS sequence data include AMP PD sample inventory metadata, which can be queried to learn file locations and participant data, and a QC flags data table. AMP PD provides examples of accessing and retrieving clinical, WGS, and RNA sequence data in the postmortem sequencing cohort through workspaces and notebooks in Terra. Registered users can access the workspace below, which includes Jupyter notebooks written in R and in python that can be cloned and edited to suit your needs.
Data Availability
The data available from the release 4 postmortem sequencing cohort includes clinical data, whole genome sequencing data and single-nucleus RNA sequencing data from case and control subjects of a range of sexes and ages as depicted below. The samples were taken from post-mortem brains at four brain banks, including Mount Sinai Brain Bank (MS), University of Miami Brain Endowment Bank (UM), Harvard Brain Tissue Resource Center (HA), and Udall Center of Excellence for Parkinson's Disease Research (UD).
Disease Status | Sex | Age | Data Available | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Brain Bank | Samples | Case | Control | Male | Female | under 70 | 70 - 85 | over 85 | Clinical | WGS | snRNAseq |
Total | 100 | 75 | 25 | 62 | 38 | 16 | 63 | 21 | 100 | 100 | 97 |
UD | 21 | 21 | 0 | 14 | 7 | 1 | 13 | 7 | 21 | 21 | 21 |
HA | 25 | 22 | 3 | 15 | 10 | 6 | 15 | 4 | 25 | 25 | 25 |
UM | 30 | 20 | 10 | 19 | 11 | 6 | 16 | 8 | 30 | 30 | 27 |
MS | 24 | 12 | 12 | 14 | 10 | 3 | 19 | 2 | 24 | 24 | 24 |
Clinical Data
The following assessments are available for participants in the post-mortem brain study and are harmonized with assessments from other cohorts available in AMP PD. Click an assessment to view additional details such as its definition, values, schema, and curation notes.
Brain Bank | ||||
---|---|---|---|---|
Table Name | UD | HA | UM | MS |
Demographics | X | X | X | X |
PD_Medical_History | X | X | X | X |
Family_History_PD | X | X | ||
Smoking_and_alcohol_history | X | X | X | |
Epworth_Sleepiness_Scale | X | |||
LBD_Cohort_Clinical_Data | X | X | X | X |
LBD_Cohort_Path_Data | X | X | X | X |
MDS_UPDRS_Part_I | X | |||
MDS_UPDRS_Part_III | X | X | ||
MDS_UPDRS_Part_IV | X | |||
MMSE | X | X | ||
Modified_Schwab___England_ADL | X | |||
UPDRS | X |
Baseline Visit: All AMP PD clinical data includes a baseline visit, which is defined as the participant’s first clinical evaluation as part of the cohort.* All visit months are calculated relative to the participant's baseline visit (e.g., “Month 12” indicates data obtained 1 year after the baseline visit). Other clinical data, such as diagnosis date, medication start date, etc. are calculated in months, relative to the baseline visit date, in order to allow researchers to establish a timeline for disease progression and treatment milestones.
For studies in which post-mortem samples were collected, such as the Postmortem Sequencing Data cohort, the date of death is used as a substitute for the baseline visit. When premortem clinical data are available for a participant in a postmortem study, visit dates will be negative, indicating the number of months prior to death the event occurred.
*BioFIND participants had an initial screening visit, followed by a more comprehensive evaluation and biospecimen collection visit 2 weeks later. Most BioFIND clinical and biospecimen data is therefore recorded as “Month 0.5”
WGS
- Data available for individual samples (primary visual cortex samples only)
- CRAM
- gVCF - Genomic variant call format produced by GATK pipeline (suffix g.vcf.gz)
- metric files - Insert size metrics (suffix .insert_size_metrics), Duplication metrics .duplicate_metrics, GC bias metrics (suffix .gc_bias.detail_metrics, .gc_bias.pdf and suffix .gc_bias.summary_metrics), Alignment summary metrics (suffix .alignment_summary), WGS metrics (suffix .wgs_metrics)
RNAseq
- Data available for multiplexed samples
- Multiplexed FASTQ - Raw sequencing data corresponding to up to 6 samples from different individuals.
- Data available for individual samples
- Individual H5AD - Count matrix of gene expression for all cells associated with the given individual (see format definition). No quality control filters were applied.
- Data available for the full sample set
- Merged H5AD - Count matrix of gene expression for all cells associated with all individuals (see format definition). Only high-quality cells were kept (UMI counts: 1,500 ≤ n_UMIs ≤ 110,000; gene counts (1100 ≤ n_genes ≤ 12,500); mitochondrial content less than 2%) but no advanced quality control was applied (e.g. doublet detection).
Sample Selection Process
The sample selection process aimed to select individuals representing the whole spectrum of PD stages for whom all five brain regions of interest were available. To satisfy this condition, we worked with four brain banks, i.e. Mount Sinai Brain Bank (MSSM), University of Miami Brain Endowment Bank (UMBEB), Harvard Brain Tissue Resource Center (Harvard), and Udall Center of Excellence for Parkinson's Disease Research (Udall). From each brain bank, we aimed to get a similar distribution of sex and age.
Library Prep & Protocol Details (WGS)
DNA was extracted from tissue samples using the QIAmp DNA kit (Qiagen, kit number 51306), according to the manufacturer’s instructions. Once DNA was extracted, samples were quantified using the Qubit Fluorometer (Life Technologies) and PicoGreen (Thermo Fisher), and sample quality was evaluated by checking Fragment Analyzer (Advanced Analytical) traces. WGS libraries were prepared using the Truseq DNA PCR-free Library Preparation Kit (Illumina, kit number 20015965, lot numbers 20698565 and 20706057) in accordance with the manufacturer’s instructions. Briefly, 1ug of DNA was sheared using a Covaris LE220 sonicator (adaptive focused acoustics). DNA fragments underwent bead-based size selection and were subsequently end-repaired, adenylated, and ligated to IDT for Illumina TruSeq DNA UD Indexes (kit 20040870, lot number 20704419). Final libraries were quantified using the Qubit Fluorometer (Life Technologies) or Spectromax M2 (Molecular Devices) and Fragment Analyzer (Advanced Analytical) or Agilent 2100 BioAnalyzer. Libraries were sequenced on an Illumina Novaseq6000 sequencer using 2x150bp cycles and S4 reagent kit v1.5 (catalog number 20028312).
Library Prep & Protocol Details (snRNA-seq)
All buffers were supplemented with RNAse inhibitors (Takara). Six samples were processed in parallel. Twenty-five mg of frozen postmortem human brain tissue from each specimen was homogenized in cold lysis buffer (0.32 M Sucrose, 5 mM CaCl2, 3 mM Magnesium acetate, 0.1 mM, EDTA, 10 mM Tris-HCl, pH8, 1 mM DTT, 0.1% Triton X-100) and filtered through a 40 µm cell strainer. The flow-through was underlaid with sucrose solution (1.8 M Sucrose, 3 mM Magnesium acetate, 1 mM DTT, 10 mM Tris-HCl, pH8) and centrifuged at 107,000 g for 1 hour at 4 °C. Pellets were resuspended in PBS supplemented with 0.5% bovine serum albumin (BSA). Resuspended nuclei were quantified (Countess II, Life Technologies) and 2 million nuclei from each sample were pelleted at 500 g for 5 minutes at 4˚C and re-suspended in 100 µl staining buffer (2% BSA, 0.02% Tween-20 in PBS). Each sample was incubated with 1 µg of a distinct TotalSeq-A nuclear hashing antibody (Biolegend) for 30 min at 4 °C. Prior to Fluorescence-Activated Nuclei Sorting (FANS), volumes were brought up to 250 µl with PBS and 7-AAD (Invitrogen) added to facilitate the detection of nuclei. 7-AAD positive nuclei were sorted into tubes pre-coated with 5% BSA using a FACSAria flow cytometer (BD Biosciences).
Following FANS, nuclei were washed twice in staining buffer before being re-suspended in 22 µl PBS and quantified. Nuclei concentrations were normalized and equal amounts from each sample were pooled together. Two aliquots of 60,000 pooled nuclei (i.e. 10,000 per sample) were processed in parallel using 3’ v3.1 reagents (10x Genomics). At the cDNA amplification step (step 2.2), reactions were supplemented with a hash-tag oligo (HTO) cDNA “additive” primer (GTGACTGGAGTTCAGACGTGTGCTCTTCCGAT*C*T; *Phosphorothioate bond). Following cDNA amplification, supernatants from the 0.6x SPRI selection step were retained for HTO library generation. Otherwise, cDNA libraries were prepared according to the manufacturer’s instructions (10x Genomics). HTO libraries were prepared as described in Stoeckius et al 2018. All libraries were sequenced at NYGC using the Novaseq platform (Illumina).
WGS Data Processing and Quality Control
All 100 samples passed quality control metrics based on the following checks: sample-level contamination (VerifyBamID FREEMIX < 0.03), read coverage (mean coverage ≥ 25), low transition transversion ratio (TiTv ratio < 4 × median absolute deviations from the dbSNP), duplication check (software King), clinically reported sex (software plink).
snRNA-seq Data Processing and Quality Control
Sequencing reads aligned to hg38 via STARsolo. Genotype-based demultiplexing assigned cells to donors using cellSNP-lite, piling up expressed alleles from polymorphic sites with a minimum minor allele frequency of 0.1 and a minimum aggregated Unique Molecular Identifiers (UMI) count of 20. Vireo clustered cells into groups corresponding to six distinct donors per pool, and identity assignment relied on genotype concordance analysis against reference SNP-array data using QTLtools-mbv. While most pools contained cells from expected donors, genotype concordance results detected and corrected occasional sample swaps and mislabeling.
Following genome alignment and demultiplexing, we removed the samples with fewer than 50 cell counts to reduce potential noise in downstream analyses as well as the samples that did not match the expected genotype. This step led to the removal of 46 samples (out of 500 samples). Then, at the cell level, we filtered out low-quality cells based on UMI counts (1,500 ≤ n(UMI) ≤ 110,000), gene counts (1100 ≤ n(genes) ≤ 12,500), and mitochondrial contents (< 2%). Checks for contamination, including ambient RNA and known confounding features, were performed. Finally, at the feature level, genes not robustly expressed by at least 0.05% of cells/nuclei were excluded. After performing these steps, we obtained a cell expression matrix with 2.4 million cells in 444 samples.