Document Images Are Acquired By Scanning Journal

1485 Words6 Pages
Document images are acquired by scanning journal, printed document, degraded document images, handwritten historical document, and book cover etc. The text may appear in a virtually unlimited number of fonts, style, alignment, size, shapes, colors, etc. Extraction of text from text document images and from complex color background is difficult due to complexity of the background and mix up of colors of fore-ground text with colors of background. In this section, we present the main ideas and details of the proposed algorithm. Implementation of any system needs the study of features, it may be symbolic, numerical or both. An example of a symbolic feature is color; an example of numerical feature is weight. Features may also result from applying a text extraction algorithm or operator to the input data. The related problems of feature selection and feature extraction must be addressed at the outset of any text recognition system design. The key is to choose and to extract features that are computationally feasible and reduce the problem data into a manageable amount of information without discarding valuable information. Different methods used for text extraction from document images (as shown in fig. 1) include: A. Feature Extraction Feature extraction involves the extracting the meaningful information from the document image. The features are classified in to Global features and Local features. Features that are extracted from whole image are known as the global features
Open Document