Automated Sentiment Classification and Topic Discovery in Large-Scale Social Media Streams

We present a framework for large-scale sentiment and topic analysis of Twitter discourse. Our pipeline begins with targeted data collection using conflict-specific keywords, followed by automated sentiment labeling via multiple pre-trained models to improve annotation robustness. We examine the relationship between sentiment and contextual features such as timestamp, geolocation, and lexical content. To identify latent themes, we apply Latent Dirichlet Allocation (LDA) on partitioned subsets grouped by sentiment and metadata attributes. Finally, we develop an interactive visualization interface to support exploration of sentiment trends and topic distributions across time and regions. This work contributes a scalable methodology for social media analysis in dynamic geopolitical contexts.
View on arXiv@article{lu2025_2505.01883, title={ Automated Sentiment Classification and Topic Discovery in Large-Scale Social Media Streams }, author={ Yiwen Lu and Siheng Xiong and Zhaowei Li }, journal={arXiv preprint arXiv:2505.01883}, year={ 2025 } }