TY - GEN
T1 - Hate or Non-hate
T2 - 2021 IEEE International Conference on Big Data, Big Data 2021
AU - Biradar, Shankar
AU - Saumya, Sunil
AU - Chauhan, Arun
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Hate speech identification in social media has emerged as a highly debated research topic in computational linguistics. Understanding linguistic phenomena in low-resource languages, in particular, remains a major problem in natural language processing. Code-mixing is a common phenomenon in social media writing, particularly in multilingual societies such as India. Traditional deep learning techniques trained on monolingual data will not perform well on code-mixed data, and training new models are challenging due to a lack of resources. Converting multilingual data into monolingual is an important solution to this challenge. TIF-DNN, a Transformer-based Interpretation and Feature Extraction Model is proposed in this work for hate speech identification. We used the IndicNLP and Englishtohindi libraries for transliteration and translation, respectively, and mBERT for feature extraction in our suggested model. Later, we compared our findings to various baseline and existing models.
AB - Hate speech identification in social media has emerged as a highly debated research topic in computational linguistics. Understanding linguistic phenomena in low-resource languages, in particular, remains a major problem in natural language processing. Code-mixing is a common phenomenon in social media writing, particularly in multilingual societies such as India. Traditional deep learning techniques trained on monolingual data will not perform well on code-mixed data, and training new models are challenging due to a lack of resources. Converting multilingual data into monolingual is an important solution to this challenge. TIF-DNN, a Transformer-based Interpretation and Feature Extraction Model is proposed in this work for hate speech identification. We used the IndicNLP and Englishtohindi libraries for transliteration and translation, respectively, and mBERT for feature extraction in our suggested model. Later, we compared our findings to various baseline and existing models.
UR - https://www.scopus.com/pages/publications/85125302345
UR - https://www.scopus.com/pages/publications/85125302345#tab=citedBy
U2 - 10.1109/BigData52589.2021.9671526
DO - 10.1109/BigData52589.2021.9671526
M3 - Conference contribution
AN - SCOPUS:85125302345
T3 - Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021
SP - 2470
EP - 2475
BT - Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021
A2 - Chen, Yixin
A2 - Ludwig, Heiko
A2 - Tu, Yicheng
A2 - Fayyad, Usama
A2 - Zhu, Xingquan
A2 - Hu, Xiaohua Tony
A2 - Byna, Suren
A2 - Liu, Xiong
A2 - Zhang, Jianping
A2 - Pan, Shirui
A2 - Papalexakis, Vagelis
A2 - Wang, Jianwu
A2 - Cuzzocrea, Alfredo
A2 - Ordonez, Carlos
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 15 December 2021 through 18 December 2021
ER -