v1v2 (latest)

From Native Memes to Global Moderation: Cross-Cultural Evaluation of Vision-Language Models for Hateful Meme Detection

7 February 2026

Mo Wang

Kaixuan Ren

Pratik Jalan

Ahmed Ashraf

Tuong Vy Vu

Rahul Seetharaman

Shah Nawaz

Usman Naseem

VLM

ArXiv (abs)PDF HTML Github

Main:9 Pages

6 Figures

Bibliography:2 Pages

6 Tables

Appendix:1 Pages

Abstract

Cultural context profoundly shapes how people interpret online content, yet vision-language models (VLMs) remain predominantly trained through Western or English-centric lenses. This limits their fairness and cross-cultural robustness in tasks like hateful meme detection. We introduce a systematic evaluation framework designed to diagnose and quantify the cross-cultural robustness of state-of-the-art VLMs across multilingual meme datasets, analyzing three axes: (i) learning strategy (zero-shot vs. one-shot), (ii) prompting language (native vs. English), and (iii) translation effects on meaning and detection. Results show that the common ``translate-then-detect'' approach deteriorate performance, while culturally aligned interventions - native-language prompting and one-shot learning - significantly enhance detection. Our findings reveal systematic convergence toward Western safety norms and provide actionable strategies to mitigate such bias, guiding the design of globally robust multimodal moderation systems.

View on arXiv

Comments on this paper