260

KEVER^2: Knowledge-Enhanced Visual Emotion Reasoning and Retrieval

Main:7 Pages
1 Figures
Bibliography:2 Pages
4 Tables
Appendix:2 Pages
Abstract

Understanding what emotions images evoke in their viewers is a foundational goal in human-centric visual computing. While recent advances in vision-language models (VLMs) have shown promise for visual emotion analysis (VEA), several key challenges remain unresolved. Emotional cues in images are often abstract, overlapping, and entangled, making them difficult to model and interpret. Moreover, VLMs struggle to align these complex visual patterns with emotional semantics due to limited supervision and sparse emotional grounding. Finally, existing approaches lack structured affective knowledge to resolve ambiguity and ensure consistent emotional reasoning across diverse visual domains.

View on arXiv
Comments on this paper