Detection of phishing websites using data mining tools and techniques

Mansi Somani, Mamatha Balachandra

Research output: Contribution to journalArticlepeer-review


Phishing, a prevailing cyber-security issue, is one of the most common attacks to obtain user's sensitive information. To eradicate it, the users or software should detect it first. A popular approach to carry out phishing is through generating phishing URLs. A URL could be legitimate or phishy which fits phishing into a perfect classification-type problem in data mining. Hence, data mining algorithms - C4.5 (J48), SVM, Random Forest, Treebag and GBM have been trained to carry out a comparison on measures - accuracy, recall and precision to determine the most suited model. Rules have been listed that categories the features which make a website phishy or legitimate. Work has been done using R language on RStudio. The dataset used comprises of 11,055 tuples and 31 attributes. It is trained, tested and used for detection. Among the five classifiers used, the best accuracy is obtained through Random Forest model which is 97.21%.

Original languageEnglish
Pages (from-to)167-183
Number of pages17
JournalInternational Journal of Advanced Intelligence Paradigms
Issue number1-2
Publication statusPublished - 2022

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Engineering(all)
  • Applied Mathematics


Dive into the research topics of 'Detection of phishing websites using data mining tools and techniques'. Together they form a unique fingerprint.

Cite this