Zone-based structural feature extraction for script identification from Indian documents

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environment. In this paper a Zone-based Structural feature extraction algorithm scheme towards the recognition of South-Indian scripts along with English and Hindi is proposed. The document images are segmented into lines and the line image is divided into different zones and the structural features are extracted. A total of 37 features were extracted in the first level and then reduced to an optimal number of features using wrapper and filter selection approaches. The K-nearest neighbor and the support vector machine classifiers are used for classification and recognition purpose. A classification accuracy of 100% is achieved on the optimal feature set.

Original languageEnglish
Title of host publication2010 5th International Conference on Industrial and Information Systems, ICIIS 2010
Pages420-425
Number of pages6
DOIs
Publication statusPublished - 2010
Event2010 5th International Conference on Industrial and Information Systems, ICIIS 2010 - Mangalore, Karnataka, India
Duration: 29-07-201001-08-2010

Conference

Conference2010 5th International Conference on Industrial and Information Systems, ICIIS 2010
Country/TerritoryIndia
CityMangalore, Karnataka
Period29-07-1001-08-10

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Information Systems and Management
  • Industrial and Manufacturing Engineering

Fingerprint

Dive into the research topics of 'Zone-based structural feature extraction for script identification from Indian documents'. Together they form a unique fingerprint.

Cite this