When Claims Evolve: Evaluating and Enhancing the Robustness of Embedding Models Against Misinformation Edits

5 March 2025

Abstract

Online misinformation remains a critical challenge, and fact-checkers increasingly rely on embedding-based methods to retrieve relevant fact-checks. Yet, when debunked claims reappear in edited forms, the performance of these methods is unclear. In this work, we introduce a taxonomy of six common real-world misinformation edits and propose a perturbation framework that generates valid, natural claim variations. Our multi-stage retrieval evaluation reveals that standard embedding models struggle with user-introduced edits, while LLM-distilled embeddings offer improved robustness at a higher computational cost. Although a strong reranker helps mitigate some issues, it cannot fully compensate for first-stage retrieval gaps. Addressing these retrieval gaps, our train- and inference-time mitigation approaches enhance in-domain robustness by up to 17 percentage points and boost out-of-domain generalization by 10 percentage points over baseline models. Overall, our findings provide practical improvements to claim-matching systems, enabling more reliable fact-checking of evolving misinformation.

View on arXiv

@article{magomere2025_2503.03417,
  title={ When Claims Evolve: Evaluating and Enhancing the Robustness of Embedding Models Against Misinformation Edits },
  author={ Jabez Magomere and Emanuele La Malfa and Manuel Tonneau and Ashkan Kazemi and Scott Hale },
  journal={arXiv preprint arXiv:2503.03417},
  year={ 2025 }
}

Comments on this paper