README
James Munyon; Graduate Student and Teaching Assistant, Department of Mathematics and Statistics, Bowling Green State University; jmunyon@bgsu.edu
jsm_proceedings_2015: paper submitted for the proceedings of the 2015 Joint Statistical Meetings, August 2015, Seattle, Washington
unabridged_senior_paper: paper submitted for partial fulfillment of requirements for the Degree of Master of Science in Mathematics, Youngstown State University, Spring 2015
code:
this folder contains all of the R code that was used in this project.
data:
this folder contains the original dataset in FASTA form, the original dataset transformed and in CSV form, the transformed dataset after 50_50 BlastClust was performed (some observations were removed due to redundancy), results from the BlastClust procedure, and some various CSV files containing lists of protein identifiers and/or known locations.
results:
this folder also contains the following subfolders:
confusion matrices:
confusion matrices for each method being run on the appropriate testing dataset
matthews_correlation_coefficient_values:
these had to be calculated seperately from the other statistics
sortable_tables:
open in excel to sort these CSV files by various columns
stats_tables:
summary statistics for each method - sensitivity, specificity, etc.
stats_tables_rounded:
self-explanatory