Data Sources

The BMEG is an expanding resource of interconnected data. Sources include:


The Cancer Genome Atlas (TCGA) profiles the DNA, RNA, protein, and epigenetic levels of over 10,000 individuals across 33 cancer types. paper


The Genotype-Tissue Expression (GTEx) project contains RNA expression data from nearly 1000 non-diseased individuals across 53 tissues sites. paper


The TCGA unified ensemble “MC3” call set was derived from over 10,000 tumor-normal exomes across 33 different cancer types using multiple variant callers. paper


PFAM describes over 13,000 protein families


The UniProt Knowledgebase (UniProtKB): protein sequence and function. paper


Gene Ontology Consortium is a controlled vocabulary describing knowledge of gene and protein roles in cells. paper


The Cancer Therapeutics Response Portal (CTRP) catalogues response profiles of 481 compounds against 860 cancer cell lines. paper.


The Genomics of Drug Sensitivity in Cancer (GDSC) database contains drug sensitivity data for almost 75 000 experiments, describing response to 138 anticancer drugs across almost 700 cancer cell lines. paper


The Cancer Cell Line Encyclopedia (CCLE) contains gene expression, chromosomal copy number, somatic variant calls and drug response profiles from 947 human cancer cell lines. paper


The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. paper


The VICC G2P is a framework for aggregating and harmonizing variant interpretations to produce a meta-knowledgebase of 12,856 aggregate interpretations covering 3,437 unique variants in 415 genes, 357 diseases, and 791 drugs. paper


Publication information from almost 30 million articles