The paper describes a neural network-based script identification system which can be used in the machine reading of documents written in English, Hindi and Kannada language scripts. Script identification is a basic requirement in automation of document processing, in multi-script, multi-lingual environments. The system developed includes a feature extractor and a modular neural network. The feature extractor consists of two stages. In the first stage the document image is dilated using 3 x 3 masks in horizontal, vertical, right diagonal, and left diagonal directions. In the next stage, average pixel distribution is found in these resulting images. The modular network is a combination of separately trained feedforward neural network classifiers for each script. The system recognizes 64 x 64 pixel document images. In the next level, the system is modified to perform on single word-document images in the same three scripts. Modified system includes a pre-processor, modified feature extractor and probabilistic neural network classifier. Pre-processor segments the multi-script multi-lingual document into individual words. The feature extractor receives these word-document images of variable size and still produces the discriminative features employed by the probabilistic neural classifier. Experiments are conducted on a manually developed database of document images of size 64 x 64 pixels and on a database of individual words in the three scripts. The results are very encouraging and prove the effectiveness of the approach.
|Number of pages||15|
|Journal||Sadhana - Academy Proceedings in Engineering Sciences|
|Issue number||PART 1|
|Publication status||Published - 01-12-2002|
All Science Journal Classification (ASJC) codes