1. INPUT TEXT DOCUMENT Image acquisition is the input text document. Acquire image of any document with the help of camera or scanner. Image acquisition is used to Acquire/obtain the image of document in color, gray level or binary format. 2. PRE-PROCESSING These are the pre-processing steps often performed in OCR 1. Binarization The simplest way to use image binarization is to choose a threshold value, and classify all pixels with values above this threshold as white, and all other pixels as black. Selecting proper threshold is very important task. In many cases, finding one threshold compatible to the entire image is very difficult, and in many cases even impossible. Therefore, adaptive image binarization is needed where an optimal threshold is chosen for each image area. Binarization is processing of converting color image in to binary image. In binarization, first we are converting color image in to Gray scale image using following formula. [2]There are various Binerization methods and in that various different algorithm used are as follows. Color image is converted into gray image and following algorithms are applied on gray scale image for converting it in to binary image. Niblack Algorithm It is local thresholding algorithm. Local thresholding algorithms give good results for document because it calculate different threshold for different part of the image, considering pixel value. Niblack’s algorithm calculates a pixel-wise threshold by sliding a
