Who are we?
The Plant Secretome and Subcellular Proteome KnowledgeBase (PlantSecKB) was created by Dr. Xiangjia (Jack) Min, Gengkom Lum and John Meinken at Youngstown State University (YSU). Dr. Min collected the data and designed the prediction algorithms. Jessica Orr and Stephanie Frazier, undergraduates, curated secreted proteins from recent literature. The work was supported by the Ohio Plant Biotechnology Consortium (through the Ohio State University, Ohio Agricultural Research and Development Center), a grant from YSU Research Council, Research Professorship, and STEM Dean's reassigned time to Dr. Min.
Dramatic increases in the number of protein sequences and full proteomes have led to an increased need for computational tools that can automate analysis of proteins based on the protein sequence. One area where automated analysis has shown considerable promise is in the prediction of protein subcellular location. Many publicly available tools have been developed to analyze a protein sequence for information related to its subcellular location.
The core goal of this project is to combine information from multiple tools in order to produce aggregate predictions that are more accurate than the predictions made by the individual tools alone. Our website offers a single location where researchers can see our predictions as well as see all of the data we have collected from the individual tools. In addition to making predictions, the knowledgebase also serves as a testing site where we can compare prediction accuracies of different tools.
The data for this website was retrieved from the UniProtKB April, 2013 release. It includes 1,415,921 proteins. For each protein, we perform analysis using SignalP3, SignalP4, TMHMM, Phobius, TargetP, WoLF PSORT, ScanProsite and FragAnchor. Results of all analysis are stored back together in the database along with the protein information.
Our predictions are made using all data available. For proteins with annotation for subcellular location (either from UniProt or curated by us), the annotation is used for prediction. For all other proteins, some combination of tool analysis results are used for prediction. We determine the best algorithms to combine data using a variety of statistical and data mining techniques. You can see an example of how our secretome prediction algorithm was developed in this paper.
Note: After the database and the paper describing the database published in CMB (2014), we have made a couple fo updates: 1) protein sequences generated from the newly sequenced pineapple genome data were added to the database; 2) for the secreted proteins, we have divided into four subcategories - curated, highly likely secreted (4 or 3 out of 4 predictors predicted to be secreted), liklely secreted (2 out of 4 predictors), and weakly likely secrted (1 out of 4 predictors); 3) for chloroplast protein prediction, only proteins predicted to be chloroplast by both TargetP and WoLF PSORT are classfied as chloroplast; 4) for mitochondrial proteins, WoLF PSORT prediction is used to replace TargetP prediction. However, if both WoLF PSORT and TargetP prediction are used, the specifiity will be improved and the sensitivity will be decreased slightly.
Further ReadingMin XJ. (2010) Evaluation of computational methods for secreted protein prediction in different eukaryotes. J. Proteomics Bioinform. 3:143-147.
Lum G, Min XJ. (2011) FunSecKB: the Fungal Secretome KnowledgeBase. Database - the Journal of Biological Databases and Curation. Vol. 2011. bar001. doi: 10.1093/database/bar001.
Meinken J, Min XJ. (2012) Computational prediction of protein subcellular locations in eukaryotes: an experience report. Computational Molecular Biology. 2(1): 1-7.
Lum G, Vanburen R, Ming R, Min XJ. (2013) Secretome Prediction and Analysis in Sacred Lotus (Nelumbo nucifera Gaertn.). Tropical Plant Biol. 6:131-137.
Lum G, Meinken J, Orr J, Frazier S, Min XJ. (2014) PlantSecKB: the Plant Secretome and Subcellular Proteome KnowledgeBase. Computational Molecular Biology. 4(1).
Meinken J, Asch DK, Neizer-Ashun KA, Chang GH, Cooper JR CR, Min XJ. (2014) FunSecKB2: a fungal protein subcellular location knowledgebase. Computational Molecular Biology. 4(7):1-17.
Using This Website
The home page has four different search options:
Search By ID - Use this option if you have a protein ID from UniProt or NCBI or you know the gene name of the protein you are interested in.
Search By Subcellular Location - Use this option to get a list of all proteins for a species that are predicted in a specific subcellular location. The species can be selected from a list of common species or entered manually.
Search By Protein Keywords or Function - Use this option to get a list of all proteins for a species that match a protein name, function or keyword. For example, to get a list of all proteins involved in amino acid transport, enter the search text "amino acid transport" (word order does not matter). The species can be selected from a list of common species or entered manually.
BLAST Search - This will take you to our BLAST search page where you can search against this database as well as several other databases we maintain.Searches:
Get a FASTA formatted list of search results:
When doing a search by subcellular location or protein keyword/function, use the "FASTA Download" button to get the results in FASTA format. The results can be easily copied and pasted to a text file if needed. For individual proteins, the FASTA formatted protein sequence is included at the bottom of the results page.
Download the search results as a text file:
When doing a search by subcellular location or protein keyword/function, use the "Search" button to get a paginated list of results. At the top of the page, you can click the link to "Download result set as a tab delimited text file".
Get the count of proteins in a search result set:
When doing a search by subcellular location or protein keyword/function, use the "Search" button to get a paginated list of results. The number of results returned along with a description of the search parameters will be included at the top of the page.
Get our prediction for subcellular location:
The results page contains a summary section at the top and a details section at the bottom. Our prediction can be found in the summary section under "Predicted Subcellular Location(s)". Note that our prediction algorithms can sometimes produce no prediction or more than one prediction. The logic for how the prediction was made will be included next to the prediction.
Get results from individual computational tools
All of the data we collected from the individual computational tools is included in the details section on the results page.
Get our annotated data
When available, our curated annotation will be included at the bottom of the details section on the results page. However, most proteins do not have local curated annotations. UniProt annotations for subcellular location are included in the summary table on the results page when available. If you want to see the supporting reference for a UniProt annotation, click the UniProt AC value to view that entry in the UniProtKB.
This database accepts public annotation for subcellular location based on experimental evidence. Submissions will be added to the database after being reviewed by our curator. We have an online form for submitting protein annotations one at a time. Or if you have a large number of proteins to submit, you can contact us directly.