Abstract
The present research is meant to assist in making the best possible optimization in sentiment analysis for Twitter information data gathered using scrape and Tweepy. For that, various preprocessing methods are thoroughly compared with each other in terms of their effect when the dataset is as-is, oversampled, and under-sampling. Moving on from pre-processing, we applied two feature extraction techniques: CountVectorizer and TF-IDF. We want to find out the preprocessing and feature extraction combination that works best in sentiment classification. They were after that applied to performance evaluation, undertaking a comparative performance evaluation role using machine learning and deep learning techniques. These were the Support Vector Machines and the Long Short-Term Memory. From the results obtained, this study established that the classification ensemble of oversampling, CountVectorizer, and SVM was the highest at 93.3% and attributed this to the fact that this combination. Further, the performance of the LSTM model, when applied to this pre-processed data, shows a distinct accuracy of 87.7%.Oversampling with the CountVectorizer led to the best improvement in performance from the SVM classifier. This, therefore, invokes the development of more robust frameworks for analysing sentiment in social media data, which has more value to areas beyond which it is applied to public opinion mining and social media analytics.
| Original language | English |
|---|---|
| Article number | 915 |
| Journal | SN Computer Science |
| Volume | 5 |
| Issue number | 7 |
| DOIs | |
| Publication status | Published - 10-2024 |
All Science Journal Classification (ASJC) codes
- General Computer Science
- Computer Science Applications
- Computer Networks and Communications
- Computer Graphics and Computer-Aided Design
- Computational Theory and Mathematics
- Artificial Intelligence