Detection Using Principle Component Analysis And Case Based Reasoning With Support Vector Machine

2147 Words Feb 3rd, 2015 9 Pages
Splice site detection using principle component analysis and case based reasoning with support vector machine

Srabanti Maji*1 and Haripada Bhunia2

1 Computer Science Department
Sri Guru Harkrishan College of Management and Technology, Raipur, Bahadurgarh;
Dist: Patiala,Punjab, India

2 Department of Chemical Engineering
Thapar University, Patiala-147004, India

*Address Correspondence to this author at
Dr. Srabanti Maji
Computer Science Department,
Sri Guru Harkrishan College of Management and Technology, Raipur, Bahadurgarh;
District: Patiala, Punjab, India

E-mail address: srabantiindia@gmail.com, srabanti9@gmail.com
Tel: +91-9356006454

ABSTRACT

Identification of coding region from genomic DNA sequence is the foremost step
…show more content…
INTRODUCTION

Research in the genome sequencing technology have been creating an enormous amount of genomic sequencing data as its main objective is gene identification. In the eukaryotes, the prediction of a coding region depends upon the exon-intron structures recognition. Whereas its very challenging to predict exon intron structure in sequence due to its complexity of structure and vast length. Research analysis on the human genome have nearly 20,000–25,000 protein-coding genes [1]. Still, there are nearly 100,000 genes in human genome. Which indicates a huge number of genes are still unidentified [2,3]. Most of the computational techniques achieve better performance, still have few drawbacks[4].

There are four different nucleotides i.e A, T, G and C in the DNA sequence, which symbolizes codons (group of three nucleotides). These gene structure codifies the proteins are essential for all living organisms [5]. Eukaryotic protein coding genes consists of introns and exons. The coding sequence is seperated by non-coding sequences, called introns,which is absent in prokaryotes. An ORF is a DNA sequence beginning with a start codon (ATG) and ending with a stop codon (TAA, TAG or TGA). The separating junction between an exon and an
Open Document