Linguistic Characteristics of Censorable Language on SinaWeibo

10 July 2018

Kei Yin Ng

Anna Feldman

Jing Peng

Chris Leberknight

Abstract

This paper investigates censorship from a linguistic perspective. We collect a corpus of censored and uncensored posts on a number of topics, build a classifier that predicts censorship decisions independent of discussion topics. Our investigation reveals that the strongest linguistic indicator of censored content of our corpus is its readability.

View on arXiv

Comments on this paper