Abstract
In recent years, there has been a lot of focus on offensive content. The amount of offensive content generated by social media is increasing at an alarming rate. It created a greater need to address this issue than ever before. To address these issues, the organizers of “Dravidian-Code Mixed HASOC-2021” have created two challenges. Task 1 involves identifying offensive content in Malayalam data, whereas Task 2 includes Malayalam and Tamil Code Mixed Sentences. Our team participated in Task 2. We used multilingual BERT to extract features in our proposed model, and we used two different classifiers, Support Vector Machine (SVM) and Deep Neural Network (DNN), on the extracted features. In addition, we used the proposed data to evaluate the performance of a monolingual BERT classifier. Our best performing model monolingual Bert received a weighted F1 score of 0.70 for Malayalam data, ranking fifth; we also received a weighted F1 score of 0.573 for Tamil Code Mixed data, ranking twelfth.
| Original language | English |
|---|---|
| Pages (from-to) | 680-687 |
| Number of pages | 8 |
| Journal | CEUR Workshop Proceedings |
| Volume | 3159 |
| Publication status | Published - 2021 |
| Event | Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021 - Gandhinagar, India Duration: 13-12-2021 → 17-12-2021 |
All Science Journal Classification (ASJC) codes
- General Computer Science