TY - JOUR
T1 - Detection of phishing websites using data mining tools and techniques
AU - Somani, Mansi
AU - Balachandra, Mamatha
N1 - Publisher Copyright:
© 2022 Inderscience Publishers. All rights reserved.
PY - 2022
Y1 - 2022
N2 - Phishing, a prevailing cyber-security issue, is one of the most common attacks to obtain user's sensitive information. To eradicate it, the users or software should detect it first. A popular approach to carry out phishing is through generating phishing URLs. A URL could be legitimate or phishy which fits phishing into a perfect classification-type problem in data mining. Hence, data mining algorithms - C4.5 (J48), SVM, Random Forest, Treebag and GBM have been trained to carry out a comparison on measures - accuracy, recall and precision to determine the most suited model. Rules have been listed that categories the features which make a website phishy or legitimate. Work has been done using R language on RStudio. The dataset used comprises of 11,055 tuples and 31 attributes. It is trained, tested and used for detection. Among the five classifiers used, the best accuracy is obtained through Random Forest model which is 97.21%.
AB - Phishing, a prevailing cyber-security issue, is one of the most common attacks to obtain user's sensitive information. To eradicate it, the users or software should detect it first. A popular approach to carry out phishing is through generating phishing URLs. A URL could be legitimate or phishy which fits phishing into a perfect classification-type problem in data mining. Hence, data mining algorithms - C4.5 (J48), SVM, Random Forest, Treebag and GBM have been trained to carry out a comparison on measures - accuracy, recall and precision to determine the most suited model. Rules have been listed that categories the features which make a website phishy or legitimate. Work has been done using R language on RStudio. The dataset used comprises of 11,055 tuples and 31 attributes. It is trained, tested and used for detection. Among the five classifiers used, the best accuracy is obtained through Random Forest model which is 97.21%.
UR - http://www.scopus.com/inward/record.url?scp=85131198228&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85131198228&partnerID=8YFLogxK
U2 - 10.1504/IJAIP.2022.123021
DO - 10.1504/IJAIP.2022.123021
M3 - Article
AN - SCOPUS:85131198228
SN - 1755-0386
VL - 22
SP - 167
EP - 183
JO - International Journal of Advanced Intelligence Paradigms
JF - International Journal of Advanced Intelligence Paradigms
IS - 1-2
ER -