Frequently Asked Questions about the OrfPredictor Server

Who are we?
OrfPredictor (ORF-Predictor) server was implemented by Dr. Xiangjia (Jack) Min when he worked at Concordia University, Montreal, Quebec, Canada, with the fungal gemomics project principal investigators including Drs. A. Tsang, R. Storms and G. Butler. Alex Spurmanis designed the logos and Wei Ding assisted in the development of the server interface. The webserver was originally installed at Concordia University. The current site is supported by Youngstown State University.

Our motivation
Generating expressed sequence tags (EST) remains a primary method for gene discovery in most organisms. Predicting open reading frames and coding regions for EST cDNA sequences is essential for functionally annotating them. Our server is designed for predicting the ORFs of a batch of EST/cDNA sequences. Note: (1) This server is NOT designed for identifying exons (or protein-coding genes) from genomic sequences. For gene prediction from genomic sequences, please go to other related sites, such as GeneMark or GenScan etc. (2) Sequences generated from the next-generation sequencers are suitable for using this tool, however, sequences having a length less than 60 bp may NOT be predicted correctly.

How does it work?
If a BLASTX output file is provided by a user, for sequences having a BLASTX hit, the frames used by BLASTX are used for identifying the coding regions of EST cDNA sequences. For sequences without a BLASTX hit or sequences without a BLASTX output file, the coding regions are predicted based on the intrinsic signals of the sequences.

Input
  1. A file contains cDNA sequences (ESTs or contig sequences assembled from ESTs) in FASTA format. Note: The number of sequences in a file or copy/paste is unlimited.
  2. Optional: A BLASTX output file for all queries. Although the BLASTX output is optional, the users are encouraged to provide a pre-computed BLASTX output file for query sequences, as BLASTX has been shown to be effective in identifying coding regions. To minimize the file size of BLASTX output for loading, the following parameters are recommended if the BLASTX in the 'NCBI-blastall' package is used: "-v 1 -b 1 -e 1e-5" (Note: we used version 2.2.19 - earlier or later versions may not work properly). Please note that a "complete, non-truncated" BLASTX output is needed for the program to work correctly. If no BLASTX file or "no hit" in the BLASTX output, the coding regions of the queries will be predicted ab initio. Users can compare the results generated by using BLASTX output with ab initio prediction results.
  3. Note: As the webserver is intended to users with small datasets and also is used for student learning, the total combined data file size is limited to 10 Mb only. If you have larger datasets, please request a standalone version to run it locally. Use Notepad to save your sequence file and your BLASTX file, do not use MS Word, as the input must be "text" only. In your BLASTX output file, there must be a line with "BLASTX #.#.#" to follow each query's output. If there is no such line, your BLASTX output will not be used by the program.

Output
A total of four files are generated. One file (OrfPredictor.pep) is in FASTA format: the definition line contains the query identifier, the frame, the beginning and the end position of the predicted coding region, and the predicted protein peptide sequences. If there is a 'FS' flag in the definition line, it means there is a frame shift in the query sequence that was detected by the BLASTX program. Only the most likely open reading frame and one coding region for a given sequence are predicted. The second file contains query identifiers not having a coding region predicted, i.e., the sequence only contains either the 5' or 3' untranslated region. In response to users' requests, two new ouput files are generated: (1) a file contains 6-frame translation of the sequences; (2) a file contains protein-coding DNA sequences extracted from the original sequences.

Security of user submitted data
The data submitted to our server will be automatically deleted after they are processed. We do not keep data submitted by a user.

How to obtain user's results
The results can be downloaded from the server web site. The results will be kept on the site for 2 days only after processing, then it will be deleted.

How to cite us
Min, X.J., Butler, G., Storms, R. and Tsang, A. OrfPredictor: predicting protein-coding regions in EST-derived sequences. Nucleic Acids Res., 2005, Web Server Issue W677-W680. Please include the server URL (http://proteomics.ysu.edu/tools/OrfPredictor.html) in your paper as the original server site was terminated.

Standalone OrfPredictor availability
The standalone version of the OrfPredictor software is available free for academic use only. It is written in Perl - easy to run in any OS. Please contact Dr. Min in the YSU Bioinformatics Lab.

Comments and suggestions
Please contact Dr. Min in the YSU Bioinformatics Lab.


Back to the OrfPredictor Server Top of Page Back to Index Page