Cross-Lingual Sentiment Quantification

We discuss \emph{Cross-Lingual Text Quantification} (CLTQ), the task of performing text quantification (i.e., estimating the relative frequency of all classes in a set of unlabelled documents) when training documents are available for a source language but not for the target language for which quantification needs to be performed. CLTQ has never been discussed before in the literature; we establish baseline results for the binary case by combining state-of-the-art quantification methods with methods capable of generating cross-lingual vectorial representations of the source and target documents involved. We present experimental results obtained on publicly available datasets for cross-lingual sentiment classification; the results show that the presented methods can perform CLTQ with a surprising level of accuracy.
View on arXiv