Social Blacklist Prediction using a Heterogeneous Information Network

Journal of Big Data (J Big Data), 2018

9 November 2018

Abstract

Blacklists are widely used in society to avoid interactions with inappropriate entities. For example, international organizations issue sanctions lists that are used to prohibit trade with entities that are involved in illegal activities. Financial institutions keep blacklists of inappropriate firms that have financial problems or environmental issues. They create their blacklists by gathering information from various news sources to keep their portfolios profitable as well as green. In the present paper, we focus on the prediction of blacklists in the finance domain. We construct a vast heterogeneous information network that covers the necessary information surrounding each firm, which is assembled using seven professionally curated datasets and two open datasets, which results in approximately 50 million nodes and 400 million edges in total. Exploiting this vast heterogeneous information network, we propose a model that can learn to predict firms that are more likely to be added to a blacklist in the near future. Our approach is tested using the negative news blacklist data of more than 35,000 firms worldwide from January 2012 to May 2018. Comparing with the state-of-the-art methods with and without the network, we show that the predictive accuracy is substantially improved when using the heterogeneous information network. This work suggests new ways to consolidate the diffuse information contained in big data to monitor dominant firms on a global scale for better risk management, more socially responsible investment, and the surveillance of dominant firms.

View on arXiv

Comments on this paper