Neural network based system for script identification in Indian documents

S. Basavaraj Patil, N. V. Subbareddy

Research output: Contribution to journalArticlepeer-review

60 Citations (Scopus)

Abstract

The paper describes a neural network-based script identification system which can be used in the machine reading of documents written in English, Hindi and Kannada language scripts. Script identification is a basic requirement in automation of document processing, in multi-script, multi-lingual environments. The system developed includes a feature extractor and a modular neural network. The feature extractor consists of two stages. In the first stage the document image is dilated using 3 x 3 masks in horizontal, vertical, right diagonal, and left diagonal directions. In the next stage, average pixel distribution is found in these resulting images. The modular network is a combination of separately trained feedforward neural network classifiers for each script. The system recognizes 64 x 64 pixel document images. In the next level, the system is modified to perform on single word-document images in the same three scripts. Modified system includes a pre-processor, modified feature extractor and probabilistic neural network classifier. Pre-processor segments the multi-script multi-lingual document into individual words. The feature extractor receives these word-document images of variable size and still produces the discriminative features employed by the probabilistic neural classifier. Experiments are conducted on a manually developed database of document images of size 64 x 64 pixels and on a database of individual words in the three scripts. The results are very encouraging and prove the effectiveness of the approach.

Original languageEnglish
Pages (from-to)83-97
Number of pages15
JournalSadhana - Academy Proceedings in Engineering Sciences
Volume27
Issue numberPART 1
Publication statusPublished - 01-12-2002

All Science Journal Classification (ASJC) codes

  • General

Fingerprint

Dive into the research topics of 'Neural network based system for script identification in Indian documents'. Together they form a unique fingerprint.

Cite this