Search

TopFIND provides search and filter functionality based on protein information and supporting evidence. A general protein search can be made through the search field in the menu bar on every page as well as through the search and advanced search on the start page. Filter options are displayed in the right column where available.

Search

Proteins can be searched for by:

  • UniProt ID or accession
  • Protein or gene name (full or part of)
  • IPI, Genebank and other identifiers
  • MEROPS protease family identifier
The search for a MEROPS protease family will retrieve all family members.

Terminus modification

To restrict the search to proteins containing a specific amino acid modification at their N- or C-terminus select the modification of interest from the modification drop-down.
Only modifications that have been identified by at least one terminus are listed. More information on each modification can be found in the list of terminal modifications.
Source: Information on modifications is based on the post-translational-modifications list provided by UniProtKB.

Filter

Evidence filter

By default, all known data is incorporated into the results page. This can be customized through a filter accessible at the top right of the page. After filtering, only data that is backed with evidence matching the given filter settings is incorporated. This will affect the termini, cleavage sites and substrates listed. Also the network view will only consider matching connections and the cleavage site preference is only built on data matching the criteria.

Modification filter

When browsing modifications of protein termini the displayed modifications can be restricted by:

  1. The type of modification. This groups a number of similar modifications like 'N2-acetylarginine' and 'N-acetylalanine' into the 'acetylation' category.
  2. The protein terminus that becomes modified.

Evidences

Information on termini, cleavages and inhibition is always accompanied by one or several evidences. An evidence contains all available data grouped into different categories (see below) that supports the information.
Evidences can be used to limit the information given for a protein to those supported by a specific evidence (see filter section).

TopFIND requires information on termini, chains, cleavages and interactions to be accompanied by experimental meta-data. This structured information is based on controlled vocabularies where possible and is meant to provide as detailed information as possible.

Perturbation

The type of perturbation used to shift the biological system from the normal physiological condition into a specific state. E.g. chemical induction of apoptosis, gene knock out, over expression of specific genes, ...

Evidence name

Every evidence has a unique name that can be used for referencing and filtering. For published work it is usually in the form: 'first author - start of publication title'
Other forms are possible

Directness of observation

Directness is a qualitative measure used by TopFIND to describe if the reported event (terminus, cleavage or inhibition) was observed directly (e.g. edman sequencing of a protein terminus.) or indirectly (e.g. indirect observation of a cleavage through direct assaying the occurrence of a new protein terminus).

Physiological relevance

A qualitative measure of the likelihood that the identified terminus, cleavage or inhibition is present and relevant in vivo.

Method (other)

Assay or methodology used to make the identification. Please use this field to specify the method used if the 'Method' drop-down does not list your method.

Confidence

A measure of the confidence that the identification is correct. The confidence is given as a value between 0 (low) and 1 (high). The type of confidence measure (e.g. MASCOT score or PeptideProphet probability for mass spectrometry based peptide confidence scores) is also given.

Method

The assay or methodology used is specified in the method field. New or rarely used methods are specified in the Method (other) section. Please contact us if you want to add a widely used method to the list of standard methods.

  • unknown
  • other
  • electronic annotation
  • COFRADIC
  • N-TAILS
  • C-TAILS
  • ATOMS
  • Edman sequencing
  • enzymatic biotynilation
  • chemical biotynilatin
  • MS gel band
  • MS semitryptic peptide
  • MS other
  • mutation analysis

Protein information

This section provides basic information retrieved from UniProtKB and MEROPS. Protein names and species and, if applicable, protease classification and isoforms are listed. The UniProtKB curated annotation is presented alongside the amino acid sequence. For further background information, links to the appropriate entries at UniProtKB and MEROPS are provided.

Protein annotation

General protein annotation includes protein names, genomic location of the encoding gene, isoform information, amino acid sequence, post translational modification, biochemical activity role in disease and more.
Source: UniProtKB

Domains & features

The linear organization of termini, cleavage sites and stable chains as well as features and domains along the primary protein sequence is graphically represented. Moving the mouse over an element will provide additional information such as exact position, type and description.

Network neighborhood

Network view shows the interconnection with other proteins through cleavage, inhibition and protein protein interactions. Some proteins are combined for conciseness and clarity of the network. The dynamic network representation is realized using Cytoscape Web
Legend:

  • filled red node: displayed protein
  • V-shaped node: protease
  • rhomboid node: protease inhibitor
  • circular node: other protein
  • blue arrows: cleavage
  • red tee bar: inhibition of proteolytic activity
  • grey line: other protein-protein interaction

Actions:
  • click and hold on free space: move canvas
  • click and hold on protein: drag node
  • double click on protein: open protein page

Termini

N- and C-termini are listed according to their position. Evidence information is given in brief tabular format and a link to full details, publications and data repository is provided. They are put into perspective by listing features and domains such as propeptides whose start or endpoint overlaps with the terminus.

Termini search

Search for termini by starting sequence

Protein N-termini matching a a specific start sequence can be searched for by entering one or several amino acids (single letter notation, upper case) to the sequence start field on the N termini search page.

Protein C-termini matching a specific sequence end can be search for by entering one or several amino acids (single letter notation, upper case) to the sequence end field on the C termini search page.

Search for termini matching a sequence pattern

You can search for termini matching a complex sequence pattern or motif by entering a regular expression in the 'Sequence matches regular expression' field on the N termini or C termini search page.
Regular expression syntax:

[MRK] A single amino acid: M, R or K
[^MRK] Any single amino acid but M, R, or K
[A-Y] Any single amino acid in the range a-y (alphabetically)
^ Start of sequence
$ End of sequence
. Any single amino acid
(L|I) L or I
A? Zero or one of A
A* Zero or more of A
A+ One or more of A
A{3} Exactly 3 of A
A{3,} 3 or more of A
A{3,6} Between 3 and 6 of A

Contribute

Your data submission his highly valuable to us. We have assembled a short pictured guide [pdf] on how to enter your experimental information and batch upload you protein termini, cleavages, cleavage sites or inhibitions. Please contact us if there are any questions or you want to enquire about other options for batch data improt (e.g. when you already have your data in a structured forma, or for example in a database). Even though the 3 step process below is very straight forward we are always happy to find even more efficient ways.

Please submit your latest termini and protease cleavage identifications along with experimental meta data.

Experimental meta data can be added through a web form with detailed help available. New data sets can be batch-imported from comma separated (csv) files in the following format:
  1. proteases cleaving a native protein
    please provide the data in the following order using the headerline: protease,position,substrate
    • protease:UniProt ID: Q9BYF1
    • position:cleavage site position in the form: 222-223
    • substrate:UniProt ID: Q9BYF1
  2. proteases cleaving a peptide
    please provide the data in the following order using the headerline: protease,peptide
    • protease:UniProt ID: Q9BYF1
    • peptide:peptide sequence with the cleavage site indicated by a colon: GA:GQCVFA
  3. proteases inhibited by protein inhibitors
    please provide the data in the following order using the headerline: protease,inhibitor
    • protease:UniProt ID: Q9BYF1
    • peptide:UniProt ID: Q9BYF1
  4. protein N-termini
    please provide the data in the following order using the headerline: protein,position,modification,confidence,confidence_type
    • protein:UniProt ID: Q9BYF1
    • position:first amino acid relative to uniprot full length protein sequence: 222
    • modification:name of the modification of the first amino acid according to TopFIND modifications: acteylation or N-acetylaspartate
    • confidence: a confidence measurement if applicable (e.g. when based on peptide identification by mass spectrometry): 0.95
    • confidence type: The type of the confidence value given (any of: 'unknown','probability','MASCOT score','X! Tandem score','PeptideProphet probability'): probability
  5. protein C-termini
    please provide the data in the following order using the headerline: protein,position,modification,confidence,confidence_type
    • protein:UniProt ID: Q9BYF1
    • position:last amino acid relative to uniprot full length protein sequence: 221
    • modification:name of the modification of the last amino acid according to TopFIND modifications: acteylation or N-acetylaspartate
    • confidence: a confidence measurement if applicable (e.g. when based on peptide identification by mass spectrometry): 0.95
    • confidence type: The type of the confidence value given (any of: 'unknown','probability','MASCOT score','X! Tandem score','PeptideProphet probability'): probability