Frequently Asked Questions about the Plant Secretome KnowledgeBase (PlantSecKB)
Who are we?
The Plant Secretome KnowledgeBase (PlantSecKB) was designed by Dr. Xiangjia (Jack) Min, Gengkon Lum and John Meinken at Youngstown State University (YSU). Gengkon and John, MS graduate students, primarily implemeted the database. Jessica Orr and Stephanie Frazier, undergraduates, curated secreted proteins from recent literature. The work was supported by a grant from the Ohio Plant Biotechnoloy Consortium and a grant from the University Research Council, YSU. This server is supported by YSU.
Our motivation
Prediction of plant protein subcellular locations is essential for understanding the functions of proteins. Our focus is on the secreted proteins or collectively called "secretome". We also provide some other subcellular locations predicted using computational tools. Our server is also aimed at collecting curation of protein subcellular locations with experimental evidence in the literature from the plant research community.
Overview of PlantSecKB
Data sources
- The database consists of two sub-databases with data obtained from two data sources. The main database, which can be accessed directly from the user interface, was constructed using the data downloaded from UniProtKB. Current version of PlantSecKB used UniProtKB release-2013-April and predicted proteins from recently sequenced sacred lotus genome (see Lum et al. 2013). The second database, an EST database, was constructed with data obtained from PlantGDB EST assembly. Current EST database consisted data released by Dec. 2011 by PlantGBD.
Methods
- We used eight computational tools for predicting subcellular locations of plant proteins. These tools are SignalP (both version 3.0 and 4.0), Phobius, TargetP, TMHMM, WoLF-Psort, PS-Scan, and FragAnchor. The prediction accuracy of plant secreted proteins was evaluated by Min (2010). Combining SignalP, Phobius, and TargetP for signal peptide prediction, with TMHMM for removing membrane proteins and with PS-Scan to remove ER luminal proteins, the accuracy which was expressed using Mathews Correlation Coefficient (MCC) was 73.2% with a sensitivity of 84.7% and a specificity of 98.7% (see Min 2010 Table 3). WoLF-Psort was not used for secreted protein prediction as it significanly decreased the prediction sensitivity for plant secreted proteins (Min 2010), however, it was used for predicting other subcellular locations including nucleus, vacuole, etc. (see our paper for details).
- Users who would like to know more about the output format of these tools for interpreting the results, please use the following links to access the help/output pages for these tools: SignalP; Phobius; TargetP; TMHMM; PS-Scan; FragAnchor; and WoLF-Psort.
Access to PlantSecKB
- The PlantSecKB user interface provides: (1) search PlantSecKB for the data generated from UniProtKB data source; (2) a link to EST database for mining ESTs; (3) a link to BLAST search page; and (4) a link to curation submission page.
- PlantSecKB can be searched by: (1) Using keywords including protein name and function (such as: amylase, alpha amylase, barley alpha amylase, alpha amylase Hordeum vulgare) or (2) UniProt Accession number (AC) (such as P00693), UniProt ID, NCBI GI, or RefSeq Accession number. It might be needed to map IDs from other databases to UniProt AC using UniProt ID mapping utility before search PlantSecKB. If a user has a protein or DNA sequence, BLAST search against our database can be performed from our BLAST server; (3) using species by selecting a species name from the list, then choose a category of subcellular location for search or download. The species list contains all species having more than 1000 protein entries in a species; (4) in addtion, for species not on the list, it can be searched by inputting a species name for secretome only.
- EST database can be searched using an EST ID or keywords. The predicted secreted proteins can be searched by choosing a species from a list. As EST sequences often are partial, i. e., do not have a complete protein coding region, the accuracy of predicted protein (peptide) subcellular locations have not been thoughly evaluated yet, careful manual examination should be taken for further using the data.
Definition of each category of "sub-proteome"
The following criteria were applied for classification of protein subcellular locations:
- Membrane proteins: A protein predicted containing one or more transmembrane domains by TMHMM was classified as a membrane protein. If there was only one transmembrane domain predicted, which is located within the N-terminus 70 amino acids, and also a signal peptide was predicted, this protein was not counted as a membrane protein. This category contains three sub-categories: chloroplast membrane proteins, mitochondrial membrane proteins, and other membrane proteins. Other membrane proteins are membrane proteins that are not targeted to mitochondria or chloroplasts.
- Chloroplast proteins: A protein predicted having "C" subcellular location by TargetP was classified as a chloroplast protein. If it was also classified as a membrane proteini by TMHMM, then it was further classified as chloroplast membrane protein.
- Mitochondrial proteins: A protein predicted having "M" subcellular location by TargetP was classified as a mitochondrial protein. If it was also classified as a membrane protein by TMHMM, then it was further classified as mitochondrial membrane protein. Thus other membrane proteins are membrane proteins that are not targeted to mitochondria or chloroplasts.
- Luminal ER proteins: Proteins predicted to contain a signal peptide by Signalp3.0 and an ER target signal (Prosite: PS00014) by PS-Scan were treated as Luminal ER proteins.
- Complete secretomes: All secreted proteins from an organism. Only proteins predicted to have a signal peptide by all three predictors including SignalP, Phobius, andTargetP and were not classified belong to any above categories were included as secreted proteins. However, proteins annotated by UniProtKB with "subcellular locations" annotated in a subcellular location other than "secreted" or "extracellular" were excluded from the category. All manually curated secreted proteins including UniProtKB annotated "secreted" and our manually curated "secreted" entries were included in the complete secretomes regardless of presence of a signal peptide or not.
- Curated secreted proteins: Proteins retrieved from UniProt/Swiss-Prot data, which were manually annotated to be "secreted" in the subcellular location, and manually collected secreted proteins from recent literature by us.
- GPI-anchored proteins: Secreted proteins that were predicted to have a GPI anchor by FragAnchor were further classified as GPI-anchored proteins. Protein sequences predicted having a signal peptide and a GPI anchor may attach to the outer leaflet of the plasma membrane or be secreted becoming components of the cell wall.
- Other subcellular proteomes: please read the paper below for details of the methods used.
How to cite us
Lum G, Meinken J, Orr J, Frazier S, Min XJ. (2014) PlantSecKB: the plant secretome and subcellular proteome knowledgebase. Computational Molecular Biology. 4(1):1-17 (doi:10.5376/cmb.2014.04.0001). Our server URL (http://proteomics.ysu.edu/secretomes/plant.php) can aslo be used as your reference.
We would like to suggest the following papers for your references:
Comments and suggestions
Please contact Dr. Min at the YSU Bioinformatics Lab.