TY - GEN
T1 - BERT based Transformers lead the way in Extraction of Health Information from Social Media
AU - Ramesh, Sidharth
AU - Tiwari, Abhiraj
AU - Choubey, Parthivi
AU - Kashyap, Saisha
AU - Khose, Sahil
AU - Lakara, Kumud
AU - Singh, Nishesh
AU - Verma, Ujjwal
N1 - Funding Information:
ACKNOWLEDGEMENT Work at The Australian National University was supported by the Australian Research Council.
Publisher Copyright:
© 2021 Association for Computational Linguistics.
PY - 2021
Y1 - 2021
N2 - This paper describes our submissions for the Social Media Mining for Health (SMM4H) 2021 shared tasks. We participated in 2 tasks: (1) Classification, extraction and normalization of adverse drug effect (ADE) mentions in English tweets (Task-1) and (2) Classification of COVID-19 tweets containing symptoms (Task-6). Our approach for the first task uses the language representation model RoBERTa with a binary classification head. For the second task, we use BERTweet, based on RoBERTa. Fine-tuning is performed on the pre-trained models for both tasks. The models are placed on top of a custom domain-specific pre-processing pipeline. Our system ranked first among all the submissions for subtask-1(a) with an F1-score of 61%. For subtask-1(b), our system obtained an F1-score of 50% with improvements up to +8% F1 over the score averaged across all submissions. The BERTweet model achieved an F1 score of 94% on SMM4H 2021 Task-6.
AB - This paper describes our submissions for the Social Media Mining for Health (SMM4H) 2021 shared tasks. We participated in 2 tasks: (1) Classification, extraction and normalization of adverse drug effect (ADE) mentions in English tweets (Task-1) and (2) Classification of COVID-19 tweets containing symptoms (Task-6). Our approach for the first task uses the language representation model RoBERTa with a binary classification head. For the second task, we use BERTweet, based on RoBERTa. Fine-tuning is performed on the pre-trained models for both tasks. The models are placed on top of a custom domain-specific pre-processing pipeline. Our system ranked first among all the submissions for subtask-1(a) with an F1-score of 61%. For subtask-1(b), our system obtained an F1-score of 50% with improvements up to +8% F1 over the score averaged across all submissions. The BERTweet model achieved an F1 score of 94% on SMM4H 2021 Task-6.
UR - http://www.scopus.com/inward/record.url?scp=85132981319&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85132981319&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85132981319
T3 - Social Media Mining for Health, SMM4H 2021 - Proceedings of the 6th Workshop and Shared Tasks
SP - 33
EP - 38
BT - Social Media Mining for Health, SMM4H 2021 - Proceedings of the 6th Workshop and Shared Tasks
A2 - Magge, Arjun
A2 - Klein, Ari Z.
A2 - Miranda-Escalada, Antonio
A2 - Al-garadi, Mohammed Ali
A2 - Alimova, Ilseyar
A2 - Miftahutdinov, Zulfat
A2 - Farre-Maduell, Eulalia
A2 - Lopez, Salvador Lima
A2 - Flores, Ivan
A2 - O'Connor, Karen
A2 - Weissenbacher, Davy
A2 - Tutubalina, Elena
A2 - Sarker, Abeed
A2 - Banda, Juan M
A2 - Krallinger, Martin
A2 - Gonzalez-Hernandez, Graciela
PB - Association for Computational Linguistics (ACL)
T2 - 6th Workshop and Shared Tasks on Social Media Mining for Health, SMM4H 2021
Y2 - 10 June 2021
ER -