Detection Using Principle Component Analysis And Case Based Reasoning With Support Vector Machine

2147 Words9 Pages
Splice site detection using principle component analysis and case based reasoning with support vector machine

Srabanti Maji*1 and Haripada Bhunia2

1 Computer Science Department
Sri Guru Harkrishan College of Management and Technology, Raipur, Bahadurgarh;
Dist: Patiala,Punjab, India

2 Department of Chemical Engineering
Thapar University, Patiala-147004, India

*Address Correspondence to this author at
Dr. Srabanti Maji
Computer Science Department,
Sri Guru Harkrishan College of Management and Technology, Raipur, Bahadurgarh;
District: Patiala, Punjab, India

E-mail address: srabantiindia@gmail.com, srabanti9@gmail.com
Tel: +91-9356006454

ABSTRACT

Identification of coding region from genomic DNA sequence is the foremost step
…show more content…
feature selection; and the final stage, in which a support vector machine (SVM) with Polynomial kernel is used for final classification. In comparison with other methods, the proposed SpliceCombo model outperforms other prediction models as the prediction accuracies are 97.25% sensitivity, 97.46% Specificity for donor splice site and 96.51% Sensitivity, 94.48% Specificity for acceptor splice site prediction.

Keywords: Gene Identification, Splicing Site, Principal Component Analysis (PCA); Cased Based Reasoning (CBR); Support Vector Machine(SVM)
*Correspondence to Srabanti Maji,
E-mail address: srabantiindia@gmail.com, srabanti9@gmail.com
Tel: +91-9356006454
Splice site detection using principle component analysis and case based reasoning with support vector machine

1. INTRODUCTION

Research in the genome sequencing technology have been creating an enormous amount of genomic sequencing data as its main objective is gene identification. In the eukaryotes, the prediction of a coding region depends upon the exon-intron structures recognition. Whereas its very challenging to predict exon intron structure in sequence due to its complexity of structure and vast length. Research analysis on the human genome have nearly 20,000–25,000 protein-coding genes [1]. Still, there are nearly 100,000 genes in human genome. Which indicates a huge number of genes are still unidentified [2,3]. Most of the computational techniques
Get Access