README James Munyon; Graduate Student and Teaching Assistant, Department of Mathematics and Statistics, Bowling Green State University; jmunyon@bgsu.edu jsm_proceedings_2015: paper submitted for the proceedings of the 2015 Joint Statistical Meetings, August 2015, Seattle, Washington unabridged_senior_paper: paper submitted for partial fulfillment of requirements for the Degree of Master of Science in Mathematics, Youngstown State University, Spring 2015 code: this folder contains all of the R code that was used in this project. data: this folder contains the original dataset in FASTA form, the original dataset transformed and in CSV form, the transformed dataset after 50_50 BlastClust was performed (some observations were removed due to redundancy), results from the BlastClust procedure, and some various CSV files containing lists of protein identifiers and/or known locations. results: this folder also contains the following subfolders: confusion matrices: confusion matrices for each method being run on the appropriate testing dataset matthews_correlation_coefficient_values: these had to be calculated seperately from the other statistics sortable_tables: open in excel to sort these CSV files by various columns stats_tables: summary statistics for each method - sensitivity, specificity, etc. stats_tables_rounded: self-explanatory