VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding

21 March 2024

Salman Khan

Fahad Shahbaz Khan

Abstract

Recent studies have demonstrated the effectiveness of Large Language Models (LLMs) as reasoning modules that can deconstruct complex tasks into more manageable sub-tasks, particularly when applied to visual reasoning tasks for images. In contrast, this paper introduces a Video Understanding and Reasoning Framework (VURF) based on the reasoning power of LLMs. Ours is a novel approach to extend the utility of LLMs in the context of video tasks, leveraging their capacity to generalize from minimal input and output demonstrations within a contextual framework. We harness their contextual learning capabilities by presenting LLMs with pairs of instructions and their corresponding high-level programs to generate executable visual programs for video understanding. To enhance the program's accuracy and robustness, we implement two important strategies. \emph{Firstly,} we employ a feedback-generation approach, powered by GPT-3.5, to rectify errors in programs utilizing unsupported functions. \emph{Secondly}, taking motivation from recent works on self-refinement of LLM outputs, we introduce an iterative procedure for improving the quality of the in-context examples by aligning the initial outputs to the outputs that would have been generated had the LLM not been bound by the structure of the in-context examples. Our results on several video-specific tasks, including visual QA, video anticipation, pose estimation, and multi-video QA, illustrate these enhancements' efficacy in improving the performance of visual programming approaches for video tasks.

View on arXiv

@article{mahmood2025_2403.14743,
  title={ VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding },
  author={ Ahmad Mahmood and Ashmal Vayani and Muzammal Naseer and Salman Khan and Fahad Shahbaz Khan },
  journal={arXiv preprint arXiv:2403.14743},
  year={ 2025 }
}

Comments on this paper