M. Bagyamathi, Dr. H. Hannah Inbarani
Recent advances in future generation sequencing technologies have resulted in a tremendous raise in the rate at which protein sequence data are being obtained. Protein sequence analysis is a significant problem in functional genomics. Feature selection techniques are capable of dealing with this high dimensional space of features. In this paper, we propose a feature selection algorithm that combines the Improved Harmony Search algorithm with Rough Set Relative Reduct for Protein sequences for faster and better search capabilities. The feature vectors are extracted from protein sequence database, based on amino acid composition and K-mer patterns or K-tuples and then feature selection is carried out from the extracted feature vectors. The proposed algorithm is compared with Improved Harmony Search hybridized with Rough Set Quick Reduct approach. The experiments are carried out on protein primary single sequence data sets which are derived from PDB on SCOP classification, based on the structural class predictions such as all a, all ß, all a+ß and all a / ß. The feature subset of protein sequences predicted by both existing and proposed algorithms are analyzed with the decision tree classification algorithms.
Data Mining; Bioinformatics; Feature Selection; Protein Sequence; Rough Set; Relative Reduct; Harmony Search; Protein sequence classification