ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.17092
33
0

An Innovative Information Theory-based Approach to Tackle and Enhance The Transparency in Phishing Detection

27 February 2024
Van Nguyen
Tingmin Wu
Xingliang Yuan
M. Grobler
Surya Nepal
Carsten Rudolph
    AAML
ArXivPDFHTML
Abstract

Phishing attacks have become a serious and challenging issue for detection, explanation, and defense. Despite more than a decade of research on phishing, encompassing both technical and non-technical remedies, phishing continues to be a serious problem. Nowadays, AI-based phishing detection stands out as one of the most effective solutions for defending against phishing attacks by providing vulnerability (i.e., phishing or benign) predictions for the data. However, it lacks explainability in terms of providing comprehensive interpretations for the predictions, such as identifying the specific information that causes the data to be classified as phishing. To this end, we propose an innovative deep learning-based approach for email (the most common phishing way) phishing attack localization. Our method can not only predict the vulnerability of the email data but also automatically learn and figure out the most important and phishing-relevant information (i.e., sentences) in the phishing email data where the selected information indicates useful and concise explanations for the vulnerability. The rigorous experiments on seven real-world diverse email datasets show the effectiveness and advancement of our proposed method in selecting crucial information, offering concise explanations (by successfully figuring out the most important and phishing-relevant information) for the vulnerability of the phishing email data. Particularly, our method achieves a significantly higher performance, ranging from approximately 1.5% to 3.5%, compared to state-of-the-art baselines, as measured by the combined average performance of two main metrics Label-Accuracy and Cognitive-True-Positive.

View on arXiv
Comments on this paper