ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.09118
16
0

Seeing Beyond the Scene: Enhancing Vision-Language Models with Interactional Reasoning

14 May 2025
Dayong Liang
Changmeng Zheng
Zhiyuan Wen
Yi Cai
Xiao Wei
Qing Li
    LRM
ArXivPDFHTML
Abstract

Traditional scene graphs primarily focus on spatial relationships, limiting vision-language models' (VLMs) ability to reason about complex interactions in visual scenes. This paper addresses two key challenges: (1) conventional detection-to-construction methods produce unfocused, contextually irrelevant relationship sets, and (2) existing approaches fail to form persistent memories for generalizing interaction reasoning to new scenes. We propose Interaction-augmented Scene Graph Reasoning (ISGR), a framework that enhances VLMs' interactional reasoning through three complementary components. First, our dual-stream graph constructor combines SAM-powered spatial relation extraction with interaction-aware captioning to generate functionally salient scene graphs with spatial grounding. Second, we employ targeted interaction queries to activate VLMs' latent knowledge of object functionalities, converting passive recognition into active reasoning about how objects work together. Finally, we introduce a lone-term memory reinforcement learning strategy with a specialized interaction-focused reward function that transforms transient patterns into long-term reasoning heuristics. Extensive experiments demonstrate that our approach significantly outperforms baseline methods on interaction-heavy reasoning benchmarks, with particularly strong improvements on complex scene understanding tasks. The source code can be accessed atthis https URL.

View on arXiv
@article{liang2025_2505.09118,
  title={ Seeing Beyond the Scene: Enhancing Vision-Language Models with Interactional Reasoning },
  author={ Dayong Liang and Changmeng Zheng and Zhiyuan Wen and Yi Cai and Xiao-Yong Wei and Qing Li },
  journal={arXiv preprint arXiv:2505.09118},
  year={ 2025 }
}
Comments on this paper