TY - GEN
T1 - Zone-based structural feature extraction for script identification from Indian documents
AU - Gopakumar, Rajesh
AU - Subbareddy, N. V.
AU - Makkithaya, Krishnamoorthi
AU - Dinesh Acharya, U.
PY - 2010
Y1 - 2010
N2 - Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environment. In this paper a Zone-based Structural feature extraction algorithm scheme towards the recognition of South-Indian scripts along with English and Hindi is proposed. The document images are segmented into lines and the line image is divided into different zones and the structural features are extracted. A total of 37 features were extracted in the first level and then reduced to an optimal number of features using wrapper and filter selection approaches. The K-nearest neighbor and the support vector machine classifiers are used for classification and recognition purpose. A classification accuracy of 100% is achieved on the optimal feature set.
AB - Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environment. In this paper a Zone-based Structural feature extraction algorithm scheme towards the recognition of South-Indian scripts along with English and Hindi is proposed. The document images are segmented into lines and the line image is divided into different zones and the structural features are extracted. A total of 37 features were extracted in the first level and then reduced to an optimal number of features using wrapper and filter selection approaches. The K-nearest neighbor and the support vector machine classifiers are used for classification and recognition purpose. A classification accuracy of 100% is achieved on the optimal feature set.
UR - https://www.scopus.com/pages/publications/77958598509
UR - https://www.scopus.com/inward/citedby.url?scp=77958598509&partnerID=8YFLogxK
U2 - 10.1109/ICIINFS.2010.5578668
DO - 10.1109/ICIINFS.2010.5578668
M3 - Conference contribution
AN - SCOPUS:77958598509
SN - 9781424466535
T3 - 2010 5th International Conference on Industrial and Information Systems, ICIIS 2010
SP - 420
EP - 425
BT - 2010 5th International Conference on Industrial and Information Systems, ICIIS 2010
T2 - 2010 5th International Conference on Industrial and Information Systems, ICIIS 2010
Y2 - 29 July 2010 through 1 August 2010
ER -