Skip to main navigation Skip to search Skip to main content

Hate or Non-hate: Translation based hate speech identification in Code-Mixed Hinglish data set

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Hate speech identification in social media has emerged as a highly debated research topic in computational linguistics. Understanding linguistic phenomena in low-resource languages, in particular, remains a major problem in natural language processing. Code-mixing is a common phenomenon in social media writing, particularly in multilingual societies such as India. Traditional deep learning techniques trained on monolingual data will not perform well on code-mixed data, and training new models are challenging due to a lack of resources. Converting multilingual data into monolingual is an important solution to this challenge. TIF-DNN, a Transformer-based Interpretation and Feature Extraction Model is proposed in this work for hate speech identification. We used the IndicNLP and Englishtohindi libraries for transliteration and translation, respectively, and mBERT for feature extraction in our suggested model. Later, we compared our findings to various baseline and existing models.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE International Conference on Big Data, Big Data 2021
EditorsYixin Chen, Heiko Ludwig, Yicheng Tu, Usama Fayyad, Xingquan Zhu, Xiaohua Tony Hu, Suren Byna, Xiong Liu, Jianping Zhang, Shirui Pan, Vagelis Papalexakis, Jianwu Wang, Alfredo Cuzzocrea, Carlos Ordonez
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2470-2475
Number of pages6
ISBN (Electronic)9781665439022
DOIs
Publication statusPublished - 2021
Event2021 IEEE International Conference on Big Data, Big Data 2021 - Virtual, Online, United States
Duration: 15-12-202118-12-2021

Publication series

NameProceedings - 2021 IEEE International Conference on Big Data, Big Data 2021

Conference

Conference2021 IEEE International Conference on Big Data, Big Data 2021
Country/TerritoryUnited States
CityVirtual, Online
Period15-12-2118-12-21

All Science Journal Classification (ASJC) codes

  • Information Systems and Management
  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Information Systems

Fingerprint

Dive into the research topics of 'Hate or Non-hate: Translation based hate speech identification in Code-Mixed Hinglish data set'. Together they form a unique fingerprint.

Cite this