WGS Variant Effect Predictor Fields
WGS Annotation VEP Fields
Once variant calling has completed for whole genome sequences (WGS), researchers want to know more about the variants, such as whether the variant impacts protein coding or how common the variant is in various populations.
AMP PD utilized tools from the Google Cloud Health team for its variant annotations. The table below details what databases and annotation fields were included in the VEP annotation pipeline.
Field | Type | Mode | Description |
---|---|---|---|
alternate_bases.CSQ | RECORD | REPEATED | List of CSQ annotations for this alternate. |
alternate_bases.CSQ.allele | STRING | NULLABLE | The ALT part of the annotation field. |
alternate_bases.CSQ.Consequence | STRING | NULLABLE | Consequence type of this variant |
alternate_bases.CSQ.IMPACT | STRING | NULLABLE | The impact modifier for the consequence type |
alternate_bases.CSQ.SYMBOL | STRING | NULLABLE | The gene symbol |
alternate_bases.CSQ.Gene | STRING | NULLABLE | Ensembl stable ID of affected gene |
alternate_bases.CSQ.Feature_type | STRING | NULLABLE | Type of feature. Currently one of Transcript, RegulatoryFeature, MotifFeature. |
alternate_bases.CSQ.Feature | STRING | NULLABLE | Ensembl stable ID of feature |
alternate_bases.CSQ.BIOTYPE | STRING | NULLABLE | Biotype of transcript or regulatory feature |
alternate_bases.CSQ.EXON | STRING | NULLABLE | The exon number (out of total number) |
alternate_bases.CSQ.INTRON | STRING | NULLABLE | The intron number (out of total number) |
alternate_bases.CSQ.HGVSc | STRING | NULLABLE | The HGVS coding sequence name |
alternate_bases.CSQ.HGVSp | STRING | NULLABLE | The HGVS protein sequence name |
alternate_bases.CSQ.cDNA_position | STRING | NULLABLE | Relative position of base pair in cDNA sequence |
alternate_bases.CSQ.CDS_position | STRING | NULLABLE | Relative position of base pair in coding sequence |
alternate_bases.CSQ.Protein_position | STRING | NULLABLE | Relative position of amino acid in protein |
alternate_bases.CSQ.Amino_acids | STRING | NULLABLE | Reference and variant amino acids. Only given if the variant affects the protein-coding sequence |
alternate_bases.CSQ.Codons | STRING | NULLABLE | The alternative codons with the variant base in upper case |
alternate_bases.CSQ.Existing_variation | STRING | NULLABLE | Known identifier of existing variant |
alternate_bases.CSQ.ALLELE_NUM | STRING | NULLABLE | Allele number from input; 0 is reference, 1 is first alternate etc |
alternate_bases.CSQ.DISTANCE | STRING | NULLABLE | Shortest distance from variant to transcript |
alternate_bases.CSQ.STRAND | STRING | NULLABLE | The DNA strand (1 or -1) on which the transcript/feature lies |
alternate_bases.CSQ.FLAGS | STRING | NULLABLE | Transcript quality flags (cds_start_NF, cds_start_NF) |
alternate_bases.CSQ.VARIANT_CLASS | STRING | NULLABLE | Sequence Ontology variant class |
alternate_bases.CSQ.SYMBOL_SOURCE | STRING | NULLABLE | The source of the gene symbol |
alternate_bases.CSQ.HGNC_ID | STRING | NULLABLE | HUGO Gene Nomenclature Committee approved symbol |
alternate_bases.CSQ.CANONICAL | STRING | NULLABLE | A flag indicating if the transcript is denoted as the canonical transcript for this gene |
alternate_bases.CSQ.TSL | STRING | NULLABLE | Transcript support level. NB: not available for GRCh37 |
alternate_bases.CSQ.APPRIS | STRING | NULLABLE | Annotates alternatively spliced transcripts as primary or alternate based on a range of computational methods. NB: not available for GRCh37 |
alternate_bases.CSQ.CCDS | STRING | NULLABLE | The CCDS identifer for this transcript, where applicable |
alternate_bases.CSQ.ENSP | STRING | NULLABLE | The Ensembl protein identifier of the affected transcript |
alternate_bases.CSQ.SWISSPROT | STRING | NULLABLE | Best match UniProtKB/Swiss-Prot accession of protein product |
alternate_bases.CSQ.TREMBL | STRING | NULLABLE | Best match UniProtKB/TrEMBL accession of protein product |
alternate_bases.CSQ.UNIPARC | STRING | NULLABLE | Best match UniParc accession of protein product |
alternate_bases.CSQ.GENE_PHENO | STRING | NULLABLE | Indicates if overlapped gene is associated with a phenotype, disease or trait |
alternate_bases.CSQ.SIFT | STRING | NULLABLE | The SIFT prediction and/or score, with both given as prediction(score) |
alternate_bases.CSQ.PolyPhen | STRING | NULLABLE | The PolyPhen prediction and/or score |
alternate_bases.CSQ.DOMAINS | STRING | NULLABLE | The source and identifer of any overlapping protein domains |
alternate_bases.CSQ.HGVS_OFFSET | STRING | NULLABLE | Indicates by how many bases the HGVS notations for this variant have been shifted |
alternate_bases.CSQ.AF | STRING | NULLABLE | Frequency of existing variant in 1000 Genomes |
alternate_bases.CSQ.AFR_AF | STRING | NULLABLE | Frequency of existing variant in 1000 Genomes combined African population |
alternate_bases.CSQ.AMR_AF | STRING | NULLABLE | Frequency of existing variant in 1000 Genomes combined American population |
alternate_bases.CSQ.EAS_AF | STRING | NULLABLE | Frequency of existing variant in 1000 Genomes combined East Asian population |
alternate_bases.CSQ.EUR_AF | STRING | NULLABLE | Frequency of existing variant in 1000 Genomes combined European population |
alternate_bases.CSQ.SAS_AF | STRING | NULLABLE | Frequency of existing variant in 1000 Genomes combined South Asian population |
alternate_bases.CSQ.AA_AF | STRING | NULLABLE | Frequency of existing variant in NHLBI-ESP African American population |
alternate_bases.CSQ.EA_AF | STRING | NULLABLE | Frequency of existing variant in NHLBI-ESP European American population |
alternate_bases.CSQ.gnomAD_AF | STRING | NULLABLE | Frequency of existing variant in gnomAD exomes combined population |
alternate_bases.CSQ.gnomAD_AFR_AF | STRING | NULLABLE | Frequency of existing variant in gnomAD exomes African/American population |
alternate_bases.CSQ.gnomAD_AMR_AF | STRING | NULLABLE | Frequency of existing variant in gnomAD exomes American population |
alternate_bases.CSQ.gnomAD_ASJ_AF | STRING | NULLABLE | Frequency of existing variant in gnomAD exomes Ashkenazi Jewish population |
alternate_bases.CSQ.gnomAD_EAS_AF | STRING | NULLABLE | Frequency of existing variant in gnomAD exomes East Asian population |
alternate_bases.CSQ.gnomAD_FIN_AF | STRING | NULLABLE | Frequency of existing variant in gnomAD exomes Finnish population |
alternate_bases.CSQ.gnomAD_NFE_AF | STRING | NULLABLE | Frequency of existing variant in gnomAD exomes Non-Finnish European population |
alternate_bases.CSQ.gnomAD_OTH_AF | STRING | NULLABLE | Frequency of existing variant in gnomAD exomes combined other combined populations |
alternate_bases.CSQ.gnomAD_SAS_AF | STRING | NULLABLE | Frequency of existing variant in gnomAD exomes South Asian population |
alternate_bases.CSQ.MAX_AF | STRING | NULLABLE | Maximum observed allele frequency in 1000 Genomes, ESP and gnomAD |
alternate_bases.CSQ.MAX_AF_POPS | STRING | NULLABLE | Populations in which maximum allele frequency was observed |
alternate_bases.CSQ.CLIN_SIG | STRING | NULLABLE | ClinVar clinical significance of the dbSNP variant |
alternate_bases.CSQ.SOMATIC | STRING | NULLABLE | Somatic status of existing variant(s); multiple values correspond to multiple values in the Existing_variation field |
alternate_bases.CSQ.PHENO | STRING | NULLABLE | Indicates if existing variant is associated with a phenotype, disease or trait; multiple values correspond to multiple values in the Existing_variation field |
alternate_bases.CSQ.PUBMED | STRING | NULLABLE | Pubmed ID(s) of publications that cite existing variant |
alternate_bases.CSQ.MOTIF_NAME | STRING | NULLABLE | The source and identifier of a transcription factor binding profile aligned at this position |
alternate_bases.CSQ.MOTIF_POS | STRING | NULLABLE | The relative position of the variation in the aligned TFBP |
alternate_bases.CSQ.HIGH_INF_POS | STRING | NULLABLE | A flag indicating if the variant falls in a high information position of a transcription factor binding profile (TFBP) |
alternate_bases.CSQ.MOTIF_SCORE_CHANGE | STRING | NULLABLE | The difference in motif score of the reference and variant sequences for the TFBP |