We constructed a comprehensive vervet gene set by merging gene transcripts from NCBI Chlorocebus sabaeus Annotation Release 100 (Pruitt et al. 2012) and ENSEMBL Chlorocebus sabaeus v.1.82 (Aken et al. 2016) , excluding pseudogenes. The merged set is non-redundant at the protein level, that is, if two transcripts from NCBI and ENSEMBL translate to the same sequence, only the NCBI transcript is included. The gene set contains 23,250 non-coding transcripts and 67,175 protein-coding transcripts (59,868 NCBI and 7,307 ENSEMBL) that represent 20,533 protein-coding genes.

Based on location in the gene region and type of nucleotide change, detected SNVs were classified into the following categories: stop-gain, stop-loss, splice-site (donor, acceptor), missense (damaging, benign, unknown), synonymous, UTR exon, non-coding gene exon, gene flank (upstream, downstream), intron. Annotation is non-redundant, that is, if a variant has conflicting annotations in different transcripts of the same gene, the earlier one in the list above is assigned. Indels were classified into the same prioritized categories, excluding stop-gain, stop-loss, missense and synonymous types, replaced by the general “coding exon indel” type.

Our definition of PTVs (protein-truncating variants) follows that implemented in the LOFTEE software ( Loss-Of-Function Transcript Effect Estimator): stop gain SNVs, splice site disrupting SNVs or indels, frameshifting indels. PTVs do not include indels with length being multiple of 3 (non-frameshifting, or inframe, indels), variants in the last 5% of a transcript, variants in non-canonical and NAGNAG splice sites or surrounding short exons (<15bp). Of 13,777 coding exon indel, splice site or stop gain alleles, 9,574 alleles in 5,668 protein-coding genes pass the PTV criteria above and were used in this study.

Chip-seq H3K27 track shows genomic profiles of H3K27 acetylation based on chromatin immunoprecipitation sequencing in the vervet liver based on three healthy adult individuals (Villar et al. 2015). H3K4me3 marks are associated with promoters.

Chip-seq H3K4 track shows genomic profiles of H3K4 trimethylation marks based on chromatin immunoprecipitation sequencing in the vervet liver based on three healthy adult individuals (Villar et al. 2015). H3K27ac marks are associated with promoters or enhancers.

Villar, Diego, Camille Berthelot, Sarah Aldridge, Tim F. Rayner, Margus Lukk, Miguel Pignatelli, Thomas J. Park, et al. 2015. “Enhancer Evolution across 20 Mammalian Species.” Cell 160 (3): 554–66.

    Median gene expression RPKM values in 7 vervet tissues from RNA-seq of 58 samples. Gene expression bar plots are aligned to left-most coordinate of NCBI gene it represents. The tissue RNA-seq data sets are deposited in the Gene Expression Omnibus (GEO) repository under project PRJNA219198.

    Jasinska AJ, Zelaya I, Service SK, et al. Genetic variation and gene expression across multiple tissues and developmental stages in a nonhuman primate. Nat Genet. 2017 Dec;49(12):1714-1721. doi: 10.1038/ng.3959. PubMed PMID: 29083405.

