ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.19108
41
0

A 106K Multi-Topic Multilingual Conversational User Dataset with Emoticons

26 February 2025
Heng Er Metilda Chee
Jiayin Wang
Zhiqiang Guo
Weizhi Ma
Qinglang Guo
Min Zhang
ArXivPDFHTML
Abstract

Instant messaging has become a predominant form of communication, with texts and emoticons enabling users to express emotions and ideas efficiently. Emoticons, in particular, have gained significant traction as a medium for conveying sentiments and information, leading to the growing importance of emoticon retrieval and recommendation systems. However, one of the key challenges in this area has been the absence of datasets that capture both the temporal dynamics and user-specific interactions with emoticons, limiting the progress of personalized user modeling and recommendation approaches. To address this, we introduce the emoticon dataset, a comprehensive resource that includes time-based data along with anonymous user identifiers across different conversations. As the largest publicly accessible emoticon dataset to date, it comprises 22K unique users, 370K emoticons, and 8.3M messages. The data was collected from a widely-used messaging platform across 67 conversations and 720 hours of crawling. Strict privacy and safety checks were applied to ensure the integrity of both text and image data. Spanning across 10 distinct domains, the emoticon dataset provides rich insights into temporal, multilingual, and cross-domain behaviors, which were previously unavailable in other emoticon-based datasets. Our in-depth experiments, both quantitative and qualitative, demonstrate the dataset's potential in modeling user behavior and personalized recommendation systems, opening up new possibilities for research in personalized retrieval and conversational AI. The dataset is freely accessible.

View on arXiv
@article{chee2025_2502.19108,
  title={ A 106K Multi-Topic Multilingual Conversational User Dataset with Emoticons },
  author={ Heng Er Metilda Chee and Jiayin Wang and Zhiqiang Guo and Weizhi Ma and Qinglang Guo and Min Zhang },
  journal={arXiv preprint arXiv:2502.19108},
  year={ 2025 }
}
Comments on this paper