Lost in translation: using global fact-checks to measure multilingual misinformation prevalence, spread, and evolution
Misinformation and disinformation are growing threats in the digital age, affecting people across languages and borders. However, no research has investigated the prevalence of multilingual misinformation and quantified the extent to which misinformation diffuses across languages. This paper investigates the prevalence and dynamics of multilingual misinformation through an analysis of 264,487 fact-checks spanning 95 languages. To study the evolution of claims over time and mutations across languages, we represent fact-checks with multilingual sentence embeddings and build a graph where semantically similar claims are linked. We provide quantitative evidence of repeated fact-checking efforts and establish that claims diffuse across languages. Specifically, we find that while the majority of misinformation claims are only fact-checked once, 10.26%, corresponding to more than 27,000 claims, are checked multiple times. Using fact-checks as a proxy for the spread of misinformation, we find 32.26% of repeated claims cross linguistic boundaries, suggesting that some misinformation permeates language barriers. However, spreading patterns exhibit strong assortativity, with misinformation more likely to spread within the same language or language family. Next we show that fact-checkers take more time to fact-check claims that have crossed language barriers and model the temporal and cross-lingual evolution of claims. We analyze connected components and shortest paths connecting different versions of a claim finding that claims gradually drift over time and undergo greater alteration when traversing languages. Misinformation changes over time, reducing the effectiveness of static claim matching algorithms. The findings advocate for expanded information sharing between fact-checkers globally while underscoring the importance of localized verification.
View on arXiv