N-Gram Assisted Youtube Spam Comment Detection

Shreyas Aiyar, Nisha P. Shetty

Research output: Contribution to journalConference articlepeer-review

29 Citations (Scopus)


This paper proposes a novel methodology for the detection of intrusive comments or spam on the video-sharing website - Youtube. We describe spam comments as those which have a promotional intent or those who deem to be contextually irrelevant for a given video. The prospects of monetisation through advertising on popular social media channels over the years has attracted an increasingly larger number of users. This has in turn led to to the growth of malicious users who have begun to develop automated bots, capable of large-scale orchestrated deployment of spam messages across multiple channels simultaneously. The presence of these comments significantly hurts the reputation of a channel and also the experience of normal users. Youtube themselves have tackled this issue with very limited methods which revolve around blocking comments that contain links. Such methods have proven to be extremely ineffective as Spammers have found ways to bypass such heuristics. Standard machine learning classification algorithms have proven to be somewhat effective but there is still room for better accuracy with new approaches. In this work, we attempt to detect such comments by applying conventional machine learning algorithms such as Random Forest, Support Vector Machine, Naive Bayes along with certain custom heuristics such as N-Grams which have proven to be very effective in detecting and subsequently combating spam comments.

Original languageEnglish
Pages (from-to)174-182
Number of pages9
JournalProcedia Computer Science
Publication statusPublished - 01-01-2018
Event2018 International Conference on Computational Intelligence and Data Science, ICCIDS 2018 - Gurugram, India
Duration: 07-04-201808-04-2018

All Science Journal Classification (ASJC) codes

  • Computer Science(all)


Dive into the research topics of 'N-Gram Assisted Youtube Spam Comment Detection'. Together they form a unique fingerprint.

Cite this