Abstract
Email spam detection is still a critical challenge in cybersecurity due to the increasing sophistication of malicious campaigns. Effective filtering is essential to protect users from phishing, fraud, and privacy violations. Traditional machine learning and rule-based approaches often fail to adapt to evolving spam patterns. Although deep learning models such as BERT have shown strong performance, many existing studies still rely on traditional feature extraction methods such as TF-IDF, which limits semantic understanding, or lack explanation in predictions. In this view, this paper addresses these gaps by evaluating various datasets namely Enron and SpamAssassin, and LingSpam and Phishing Email using DistilBERT and BERT embeddings to represent emails in a semantically rich feature space. Seven models such as Logistic Regression, Random Forest, Naive Bayes, Decision Tree, KNN, AdaBoost, and a Neural Network (MLP Classifier), were evaluated with hyperparameter tuning. The Neural Network with DistilBERT achieved the highest accuracies of 0.99 and 0.98, while BERT achieved 0.98 and 0.97 across both datasets. Furthermore, to improve interpretability, FLAN-T5 was incorporated to generate natural language explanations, thereby enabling transparency in model decisions. The proposed approach effectively balances accuracy and interpretability, making it suitable for real-world deployment.
| Original language | English |
|---|---|
| Article number | 146 |
| Journal | SN Applied Sciences |
| Volume | 8 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - 02-2026 |
All Science Journal Classification (ASJC) codes
- General Chemical Engineering
- General Materials Science
- General Environmental Science
- General Engineering
- General Physics and Astronomy
- General Earth and Planetary Sciences
Fingerprint
Dive into the research topics of 'An LLM driven framework for email spam detection using DistilBERT embeddings and neural classifiers'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver