Abstract
Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environment. In this paper a Zone-based Structural feature extraction algorithm scheme towards the recognition of South-Indian scripts along with English and Hindi is proposed. The document images are segmented into lines and the line image is divided into different zones and the structural features are extracted. A total of 37 features were extracted in the first level and then reduced to an optimal number of features using wrapper and filter selection approaches. The K-nearest neighbor and the support vector machine classifiers are used for classification and recognition purpose. A classification accuracy of 100% is achieved on the optimal feature set.
Original language | English |
---|---|
Title of host publication | 2010 5th International Conference on Industrial and Information Systems, ICIIS 2010 |
Pages | 420-425 |
Number of pages | 6 |
DOIs | |
Publication status | Published - 2010 |
Event | 2010 5th International Conference on Industrial and Information Systems, ICIIS 2010 - Mangalore, Karnataka, India Duration: 29-07-2010 → 01-08-2010 |
Conference
Conference | 2010 5th International Conference on Industrial and Information Systems, ICIIS 2010 |
---|---|
Country/Territory | India |
City | Mangalore, Karnataka |
Period | 29-07-10 → 01-08-10 |
All Science Journal Classification (ASJC) codes
- Information Systems
- Information Systems and Management
- Industrial and Manufacturing Engineering