Categorical Boosting Machine for Tamil Character Recognition Using Shape Based Features

  • Vishnu Mukundan Vishnu
  • , Isha Nevatia
  • , Tusar Kanti Mishra*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Tamil is one of the world's earliest surviving languages, from which modern Indian scripts draw inspiration. However, there has been limited development of Optical Character Recognition techniques for Tamil characters. This paper describes a model for Tamil vowels-based Optical Character Recognition. The strategy seeks to utilise a CHFR feature extraction technique to extract three distinct feature vectors from the contours of each individual character's centroid. These feature vectors are condensed into a single vector before being sent to the CatBoost classifier for classification. Using the Optuna framework, hyperparameter optimisation and comparative analysis of existing models are performed. The proposed scheme is applied to a dataset of 6,972 handwritten samples evenly divided into 12 classes representing the vowels of the Tamil alphabet. The primary evaluation metric is accuracy, and 84.36% testing accuracy is observed.

Original languageEnglish
Title of host publication2023 7th International Conference On Computing, Communication, Control And Automation, ICCUBEA 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350304268
DOIs
Publication statusPublished - 2023
Event2023 7th International Conference On Computing, Communication, Control And Automation, ICCUBEA 2023 - Pune, India
Duration: 18-08-202319-08-2023

Publication series

Name2023 7th International Conference On Computing, Communication, Control And Automation, ICCUBEA 2023

Conference

Conference2023 7th International Conference On Computing, Communication, Control And Automation, ICCUBEA 2023
Country/TerritoryIndia
CityPune
Period18-08-2319-08-23

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Signal Processing
  • Information Systems and Management
  • Control and Optimization
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Categorical Boosting Machine for Tamil Character Recognition Using Shape Based Features'. Together they form a unique fingerprint.

Cite this