Target and Drug Lists

Target lists

As introduced back in 2013 we offer users a selection of attributed target lists extracted from the literature either as supplementary data or downloads from various databases. While we noticed these have seen some past user traffic we have decided not to regularly update them (in the way that we do for every release of our own database). This is mainly due to the proliferation of more sets making it difficult to keep on top of new versions. Anyone specifically interested in an update of any of those below but may be having difficulty with the in situ downloads, is welcome to contact us.

The criteria for inclusion in these lists are drug target coverage for human proteins. However, the exact definition varies between lists, as explained in the metadata below. This includes different terminology (e.g. "successful, "approved" or "proven"). There are also differences in primary target (~1:1 drug: protein) vs. secondary or subunit mappings (1:many).

There are many utilities you can explore but two that you might consider are a) following the database links and b) comparing them for intersects (protein IDs in common) and differentials (protein IDs unique to particular lists or subsets). This obviously extends to comparisons with lists you may generate in the course of your own or other published work (e.g. expression data or disease association gene candidates). We would be interested to hear from you a) what other utilities you find valuable and b) other recently published target lists that you recommend for inclusion. If you have an unpublished but openly provenanced new list (e.g. on figshare) we would be pleased to consider this. Obviously we may eventually need to cap the number we host but new ones can displace older lists.

Our metadata descriptions are minimal since context is provided either in the references and/or the download descriptions for the appropriate databases. The lists are Excel sheets of UniProtKB, HGNC and ChEMBL live links. You should be able to get to most other sources from these three entry points. In addition, if you paste the UniProtKB list into the ID mapping interface you can select different intersects by Boolean selects or post-query display options.

Lists that are not UniProtKB Accessions in the first place are normalised to these (e.g. mappings of Human Gene Nomenclature Committee (HGNC) Symbols or Entrez Gene IDs (EGID) to UniProtKB). They are then filtered to human and Swiss-Prot (i.e. any TrEMBL entries are removed) and to approved drug targets if this is an option in the original list. In such cases lists we host thus become transformations, rather than direct facsimiles, of the primary sources. Given such ID cross-mappings are not perfect; we cannot guarantee their absolute correctness. However, our versions are supplied in good faith and the originals are available in every case. If you need the cross-mapping details for any particular list you are welcome to contact us.

If you are unfamiliar with protein list "slicing and dicing" we recommend the following:

Attributions and brief descriptions for the Excel sheets:

  1. The current list of Guide to PHARMACOLOGY cross-references in UniProtKB. Note these are the 2017 protein accessions (from the three species human, mouse and rat) that have quantitative interactions with any of the ~9,800 ligands, including approved drugs, research compounds as well as endogenous ligands, mapped to GtoPdb target identifiers. This does not include data from screens. Download list [tab separated file] .
    Alternatively, click here for the complete list of targets in GtoPdb which have an annotated UniProtKB/SwissProt accession in all 3 species.
  2. Supplementary data from "The Druggable Genome: Evaluation of Drug Targets in Clinical Trials Suggests Major Shifts in Molecular Class and Indication" (2013) [PMID:24016212]. The comprehensive list includes 461 targets of approved drugs. Download list
  3. A 3-way consensus list from the paper "Comparing the Chemical Structure and Protein Content of ChEMBL, DrugBank, Human Metabolome Database and the Therapeutic Target Database" (2013) [Abstract]. The 352 are proteins-in-common between the three drug databases. Download list
  4. The Therapeutic Target Database (release 4.3.02, 18th Oct 2013) protein IDs for successful targets (downloaded from here). The web page states 388 but these reduced to 345 human Swiss-Prot accessions. Download list
  5. ChEMBL (release 17 August 2013) includes a download option for approved drug targets. This converted to 251 human Swiss-Prot accessions but note this does not encompass additional protein IDs from target groups. Download list
  6. Supplementary data from "Novelty in the target landscape of the pharmaceutical industry" (2013) [PMID:23903214]. The listing of proven targets converted to 248 human Swiss-Prot accessions. Download list
  7. Supplementary data from "Analysis of in vitro bioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffolds" (2011) [PMID:21569515]. Since there are compound counts included, the original download is included as a worksheet. In this case the Entrez Gene IDs were mapped to 1651 human Swiss-Prot accessions but this includes both approved and research targets. Download list
  8. DrugBank release 3.0. A download of protein IDs directly related to the mechanism of action for drugs with known pharmacological action. The release date was 2011 but the download in 2013 produced 621 human Swiss-Prot accessions (this will be replaced when the downloads for 4.0 become available in Dec 2013). Download list
  9. Supplementary data from "Trends in the exploitation of novel drug targets" (2011) [PMID:21804595]. The list included 438 human Swiss-Prot accessions but note this study has been updated in PMID:24016212 (list 1). Download list

Drug lists

In parallel with the target lists we offer users a selection of attributed drug lists, in the broad sense of encompassing approved, clinical or research small-molecules, together with biologicals in some cases. These have been downloaded from databases or extracted as supplementary data from journal papers. We much appreciate their availability but note they are supplied as-is. The lists have a variety of utilities but are particularly useful for name, synonym and identifier look-ups. There can be some inter-list discordance in name-to-structure mappings for technical reasons but you can resolve individual entries you are interested in. Those with SMILES can be cheminformatically processed (there is a general description of comparing the chemistry and targets between databases in this recent paper). While our own database has full search capability we strategically curate a smaller, more concise set of term-to-structure mappings between our ligands and proteins. Thus, some of the posted sets include nominally approved prescription compounds we have chosen not to activity-map (e.g. nutraceuticals, vitamins and some non-INNs) since their relationships are multiplexed.

As ever, we would be pleased to hear your views on utility and additional sources you might recommend for inclusion. Since our own approved drug lists are being consolidated and will imminently be updated in PubChem, we shall surface these in due course. In the interim, drug entries can be browsed and retrieved from our ligand list. Attributions and brief descriptions for the Excel sheets are:

  1. PubChem 986 5-way drug set. This intersect (January 2014) was made between DrugBank, Therapeutic Target Database, ChEMBL, Thomson Pharma and with USAN or INN in the record. It was filtered for one component (i.e. salt-stripped) and Mw < 100. It thus forms a "core" set of concordant manually curated structures, most of which (but not all) are approved drugs. The CIDs are also instantiated as a MyNCBI public URL. As a PubChem display, this can be browsed, sorted and downloaded. Download list
  2. ChEMBL_17 10341 entries retrieved with INN or USAN. Downloaded from ChEMBL in November 2013. Includes canonical SMILES and many other column details. Download list
  3. ChEMBL_17 1863 Phase 4 (approved) drugs (as a subset of the 10341 above). Downloaded from ChEMBL and includes canonical SMILES. The corresponding 251 targets are included in our target lists (both will be replaced when ChEMBL_18 is released). Download list
  4. Supplementary data from "The Druggable Genome: Evaluation of Drug Targets in Clinical Trials Suggests Major Shifts in Molecular Class and Indication" (2014) [PMID:24016212]. Downloaded in November 2013 this includes 2455 drug names. Targets are ascribed to each of the drugs which multiplexes the list out to over 60K rows. The 461 human targets of approved drugs from the same data sheet are included in our target lists. Download list
  5. Supplementary data from "A forensic analysis of drug targets from 2000 through 2012" (2013) [PMID:23756372]. This lists 345 new drugs approved by the FDA between 2000 and 2012 along with their modes of action. Download list
  6. From DrugBank 3.0, 1353 approved drugs downloaded in June 2013. This includes SMILES and PubChem CIDs (this will be replaced when the downloads for 4.0 become available in 2014). Download list

Page last updated 27th January 2017