AMP PD Cloud Architecture

All AMP PD data is stored on Google Cloud Platform (GCP) and can be accessed by approved researchers using both AMP PD Researcher Workbench and GCP-native tools.

Data is Available as files in Google Cloud Storage and as tables in Google BigQuery.

Google Cloud Storage

Google Cloud Storage is a general purpose object store. One more commonly considers it the place to store "data files" in a manner similar to a file system. Cloud Storage provides the ability to create "buckets" in order to store files (objects). Objects can be stored, listed, and accessed via directory format. 

Access to Cloud Storage is through a standard command-line tool, a web interface, and/or an authenticated REST API with client libraries for many programming languages. 

Data files such as FASTQ, CRAM, and VCF files will be stored in Google Cloud Storage:

Google BigQuery

BigQuery, Google's serverless, highly scalable enterprise data warehouse, is designed to make data analysts more productive with unmatched price-performance. Because there is no infrastructure to manage, you can focus on uncovering meaningful insights using familiar SQL without the need for a database administrator.

With Google BigQuery, you can run SQL style queries on billions of rows and get results back in seconds. All AMP PD data that tabular in nature is available for SQL access in Google BigQuery.