PhishLang: A Real-Time, Fully Client-Side Phishing Detection Framework Using MobileBERT

11 August 2024

Abstract

In this paper, we introduce PhishLang, the first fully client-side anti-phishing framework built on a lightweight ensemble framework that utilizes advanced language models to analyze the contextual features of a website's source code and URL. Unlike traditional heuristic or machine learning approaches that rely on static features and struggle to adapt to evolving threats, or deep learning models that are computationally intensive, our approach utilizes MobileBERT, a fast and memory-efficient variant of the BERT architecture, to capture nuanced features indicative of phishing attacks. To further enhance detection accuracy, PhishLang employs a multi-modal ensemble approach, combining both the URL and Source detection models. This architecture ensures robustness by allowing one model to compensate for scenarios where the other may fail, or if both models provide ambiguous inferences. As a result, PhishLang excels at detecting both regular and evasive phishing threats, including zero-day attacks, outperforming popular anti-phishing tools, while operating without relying on external blocklists and safeguarding user privacy by ensuring that browser history remains entirely local and unshared. We release PhishLang as a Chromium browser extension and also open-source the framework to aid the research community.

View on arXiv

@article{roy2025_2408.05667,
  title={ PhishLang: A Real-Time, Fully Client-Side Phishing Detection Framework Using MobileBERT },
  author={ Sayak Saha Roy and Shirin Nilizadeh },
  journal={arXiv preprint arXiv:2408.05667},
  year={ 2025 }
}

Comments on this paper