Disadvantages Of Syntactical Based Features

3.3 Syntactical based features:
Syntactical based features are used mainly for structural class and protein fold. These features are very basic and simple features. These features are taken from 20 amino acids of protein sequence.
Occurrence (O): The protein consists of 20 unique amino acids. The amino acid occurrence of each protein is the frequency of amino in a protein sequences which producing 20 features. This amino acid occurrence of each protein is computed by the equation, n i = (ni 1, ni 2, ..., n ij , ..., ni 20) (3.1) where i is the number of proteins ni 1, ni 2 etc that represents the number of amino acids of each type such as Ala, Arg etc in i th protein(Taguchi and Gromiha,2007). This Occurrence (O) feature is very simple and effective to

Alone using these features the classification accuracy will not be improved. Features extracted directly from syntactical based features will not give the good results for protein fold problem and structural class problem. The result of those features is not accurate. After calculating Occurrence (O) and Pairwise frequency (PF1) the result should be added with PSSM matrix of protein sequences. So it may lead to long process.

3.4 Evolutionary based features:
Evolutionary-based features are very popular features which perform better than syntactical based features. These features are based on Position Specific Scoring Matrix (PSSM). PSSM is a form of matrix that defines the probability of any given amino acid occurring at a particular position in the sequence of protein.
Bigram feature is a very important feature that represents the transitional probabilities from one amino acid to another and is based on PSSM which is also producing 400

