VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction

30 April 2025

Abstract

Generating responsive listener head dynamics with nuanced emotions and expressive reactions is crucial for practical dialogue modeling in various virtual avatar animations. Previous studies mainly focus on the direct short-term production of listener behavior. They overlook the fine-grained control over motion variations and emotional intensity, especially in long-sequence modeling. Moreover, the lack of long-term and large-scale paired speaker-listener corpora including head dynamics and fine-grained multi-modality annotations (e.g., text-based expression descriptions, emotional intensity) also limits the application of dialoguethis http URL, we first newly collect a large-scale multi-turn dataset of 3D dyadic conversation containing more than 1.4M valid frames for multi-modal responsive interaction, dubbed ListenerX. Additionally, we propose VividListener, a novel framework enabling fine-grained, expressive and controllable listener dynamics modeling. This framework leverages multi-modal conditions as guiding principles for fostering coherent interactions between speakers andthis http URL, we design the Responsive Interaction Module (RIM) to adaptively represent the multi-modal interactive embeddings. RIM ensures the listener dynamics achieve fine-grained semantic coordination with textual descriptions and adjustments, while preserving expressive reaction with speaker behavior. Meanwhile, we design the Emotional Intensity Tags (EIT) for emotion intensity editing with multi-modal information integration, applying to both text descriptions and listener motionthis http URLexperiments conducted on our newly collected ListenerX dataset demonstrate that VividListener achieves state-of-the-art performance, realizing expressive and controllable listener dynamics.

View on arXiv

@article{li2025_2504.21718,
  title={ VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction },
  author={ Shiying Li and Xingqun Qi and Bingkun Yang and Chen Weile and Zezhao Tian and Muyi Sun and Qifeng Liu and Man Zhang and Zhenan Sun },
  journal={arXiv preprint arXiv:2504.21718},
  year={ 2025 }
}

Comments on this paper