TY - JOUR
T1 - Neural network based system for script identification in Indian documents
AU - Basavaraj Patil, S.
AU - Subbareddy, N. V.
PY - 2002/12/1
Y1 - 2002/12/1
N2 - The paper describes a neural network-based script identification system which can be used in the machine reading of documents written in English, Hindi and Kannada language scripts. Script identification is a basic requirement in automation of document processing, in multi-script, multi-lingual environments. The system developed includes a feature extractor and a modular neural network. The feature extractor consists of two stages. In the first stage the document image is dilated using 3 x 3 masks in horizontal, vertical, right diagonal, and left diagonal directions. In the next stage, average pixel distribution is found in these resulting images. The modular network is a combination of separately trained feedforward neural network classifiers for each script. The system recognizes 64 x 64 pixel document images. In the next level, the system is modified to perform on single word-document images in the same three scripts. Modified system includes a pre-processor, modified feature extractor and probabilistic neural network classifier. Pre-processor segments the multi-script multi-lingual document into individual words. The feature extractor receives these word-document images of variable size and still produces the discriminative features employed by the probabilistic neural classifier. Experiments are conducted on a manually developed database of document images of size 64 x 64 pixels and on a database of individual words in the three scripts. The results are very encouraging and prove the effectiveness of the approach.
AB - The paper describes a neural network-based script identification system which can be used in the machine reading of documents written in English, Hindi and Kannada language scripts. Script identification is a basic requirement in automation of document processing, in multi-script, multi-lingual environments. The system developed includes a feature extractor and a modular neural network. The feature extractor consists of two stages. In the first stage the document image is dilated using 3 x 3 masks in horizontal, vertical, right diagonal, and left diagonal directions. In the next stage, average pixel distribution is found in these resulting images. The modular network is a combination of separately trained feedforward neural network classifiers for each script. The system recognizes 64 x 64 pixel document images. In the next level, the system is modified to perform on single word-document images in the same three scripts. Modified system includes a pre-processor, modified feature extractor and probabilistic neural network classifier. Pre-processor segments the multi-script multi-lingual document into individual words. The feature extractor receives these word-document images of variable size and still produces the discriminative features employed by the probabilistic neural classifier. Experiments are conducted on a manually developed database of document images of size 64 x 64 pixels and on a database of individual words in the three scripts. The results are very encouraging and prove the effectiveness of the approach.
UR - https://www.scopus.com/pages/publications/0036464677
UR - https://www.scopus.com/inward/citedby.url?scp=0036464677&partnerID=8YFLogxK
U2 - 10.1007/bf02703314
DO - 10.1007/bf02703314
M3 - Article
AN - SCOPUS:0036464677
SN - 0256-2499
VL - 27
SP - 83
EP - 97
JO - Sadhana - Academy Proceedings in Engineering Sciences
JF - Sadhana - Academy Proceedings in Engineering Sciences
IS - PART 1
ER -