STRING 10 
  

Download Area

STRING uses a relational database system (PostgreSQL) to store primary data and precomputed predictions. For convenience, we provide selected data-items as flatfiles below.
Please note: the complete dataset of STRING is also available - but it requires signing a license agreement (free for academics, see here for details).

Files that do not require a separate license agreement are published under a Creative Commons Attribution 3.0 License or a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.
For commercial use or customized versions, please contact biobyte solutions GmbH.

Protein mode (flatfiles)
- File -- Description -- Access -
protein.sequences.v10.fa.gz (2.2 Gb)sequences of all proteins in STRINGCreative Commons License
protein.links.v10.txt.gz (11 Gb)protein network data (scored links between proteins)Creative Commons License
protein.links.detailed.v10.txt.gz (16.7 Gb)protein network data (incl. subscores per channel); commercial entities require a license.Creative Commons License
protein.actions.v10.txt.gz (3 Gb)interaction types for protein linksCreative Commons License
protein.links.full.v10.txt.gz (17.8 Gb)protein network data (incl. distinction: direct vs. interologs); all users require a licenselicense required

Files too large? Enter or select an organism to restrict the network before downloading:

 
 
COG mode (flatfiles)
- File -- Description -- Access -
COG.mappings.v10.txt.gz (127.6 Mb)orthologous groups (COGs,NOGs,KOGs,...) and their proteinsCreative Commons License
protein.sequences.v10.fa.gz (2.2 Gb)sequences of all proteins in STRING (can be used as a blast db)Creative Commons License
species.mappings.v10.txt.gz (19.1 Mb)presence / absence of orthologous groups in speciesCreative Commons License
COG.links.v10.txt.gz (107.1 Mb)association scores between orthologous groupsCreative Commons License
COG.links.detailed.v10.txt.gz (163.2 Mb)association scores (incl. subscores per channel); commercial entities require a license.Creative Commons License
 
General flatfiles & full database dumps
- File -- Description -- Access -
species.v10.txt (141.8 Kb)organisms in STRINGCreative Commons License
species.tree.v10.txt (46.5 Kb)STRING tree of speciesCreative Commons License
database.schema.v10.pdf (119.1 Kb)STRING database schemaCreative Commons License
protein.aliases.v10.txt.gz (545 Mb)aliases for STRING proteins: locus names, accessions, descriptions...Creative Commons License
mapping_files (FTP directory)separate identifier mapping files, for several frequently used name_spaces...Creative Commons License
items_schema.v10.sql.gz (4.4 Gb)full database, part I: the players (proteins, species, COGs,...)license required
network_schema.v10.sql.gz (21.4 Gb)full database, part II: the networks (nodes, edges, scores,...)license required
evidence_schema.v10.sql.gz (301.9 Gb)full database, part III: interaction evidence (datasets, abstracts, predictions, ...)license required
homology_schema.v10.sql.gz (459.7 Gb)full database, part IV: homology data (all-against-all SIMAP similarity searches)license required
 
Please note: STRING is subject to periodic updates. Therefore, do visit back on this page to get the latest associations whenever needed.
Protein identifiers in the above files contain two substrings each: 'NNNNN.aaaaaa'. The first substring is the NCBI taxonomy species identifier, and the second substring is the RefSeq/Ensembl-identifier of the protein.
Please note that some of the files are very large. You may experience problems downloading them, depending on your browser and/or operating system.