ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.09711
  4. Cited By
STAR: A Benchmark for Situated Reasoning in Real-World Videos

STAR: A Benchmark for Situated Reasoning in Real-World Videos

15 May 2024
Bo Wu
Shoubin Yu
Zhenfang Chen
Joshua B Tenenbaum
Chuang Gan
ArXivPDFHTML

Papers citing "STAR: A Benchmark for Situated Reasoning in Real-World Videos"

50 / 139 papers shown
Title
Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects
Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects
Agnese Chiatti
Sara Bernardini
Lara Shibelski Godoy Piccolo
Viola Schiaffonati
Matteo Matteucci
44
0
0
08 May 2025
R^3-VQA: "Read the Room" by Video Social Reasoning
R^3-VQA: "Read the Room" by Video Social Reasoning
Lixing Niu
Jiapeng Li
Xingping Yu
Shu Wang
Ruining Feng
Bo Wu
Ping Wei
Y. Wang
Lifeng Fan
36
0
0
07 May 2025
DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes
DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes
S. Linok
Vadim Semenov
Anastasia Trunova
Oleg Bulichev
Dmitry A. Yudin
32
0
0
06 May 2025
SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding
SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding
Chenkai Zhang
Yiming Lei
Z. Liu
Haitao Leng
Shaoguo Liu
Tingting Gao
Qingjie Liu
Yunhong Wang
AI4TS
46
0
0
30 Apr 2025
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
David Ma
Y. Zhang
J. Ren
Jarvis Guo
Yifan Yao
...
Shiwen Ni
J. H. Liu
Wenhao Huang
Ge Zhang
Xiaojie Jin
VLM
35
0
0
21 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
98
0
0
17 Apr 2025
VideoAds for Fast-Paced Video Understanding: Where Opensource Foundation Models Beat GPT-4o & Gemini-1.5 Pro
VideoAds for Fast-Paced Video Understanding: Where Opensource Foundation Models Beat GPT-4o & Gemini-1.5 Pro
Zheyuan Zhang
Monica Dou
Linkai Peng
Hongyi Pan
Ulas Bagci
Boqing Gong
VLM
54
0
0
12 Apr 2025
How Can Objects Help Video-Language Understanding?
How Can Objects Help Video-Language Understanding?
Zitian Tang
Shijie Wang
Junho Cho
Jaewook Yoo
Chen Sun
33
0
0
10 Apr 2025
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation
Hao Du
Bo Wu
Yan Lu
Zhendong Mao
16
0
0
08 Apr 2025
REVEAL: Relation-based Video Representation Learning for Video-Question-Answering
REVEAL: Relation-based Video Representation Learning for Video-Question-Answering
Sofian Chaybouti
Walid Bousselham
Moritz Wolter
Hilde Kuehne
32
0
0
07 Apr 2025
InstructionBench: An Instructional Video Understanding Benchmark
InstructionBench: An Instructional Video Understanding Benchmark
Haiwan Wei
Yitian Yuan
Xiaohan Lan
Wei Ke
Lin Ma
ELM
24
0
0
07 Apr 2025
Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation
Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation
Chuanqi Cheng
Jian-Yu Guan
Wei Yu Wu
Rui Yan
VLM
40
0
0
03 Apr 2025
H2VU-Benchmark: A Comprehensive Benchmark for Hierarchical Holistic Video Understanding
H2VU-Benchmark: A Comprehensive Benchmark for Hierarchical Holistic Video Understanding
Qi Wu
Quanlong Zheng
Yanhao Zhang
Junlin Xie
Jinguo Luo
...
Peng Liu
Qingsong Xie
Ru Zhen
Haonan Lu
Zhenyu Yang
VLM
58
0
0
31 Mar 2025
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
Y. Wang
Y. Wang
Bo Chen
Tong Wu
Dongyan Zhao
Zilong Zheng
VLM
MLLM
49
1
0
29 Mar 2025
Video-VoT-R1: An efficient video inference model integrating image packing and AoE architecture
Video-VoT-R1: An efficient video inference model integrating image packing and AoE architecture
Cheng Li
Jiexiong Liu
Yixuan Chen
Yanqin Jia
MLLM
VLM
69
0
0
20 Mar 2025
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Thanh-Son Nguyen
Hong Yang
Tzeh Yuan Neoh
Hao Zhang
Ee Yeo Keat
Basura Fernando
NAI
54
0
0
19 Mar 2025
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
Chongjun Tu
Lin Zhang
Pengtao Chen
Peng Ye
Xianfang Zeng
W. Cheng
Gang Yu
Tao Chen
79
0
0
19 Mar 2025
VITED: Video Temporal Evidence Distillation
VITED: Video Temporal Evidence Distillation
Yujie Lu
Yale Song
William Yang Wang
Lorenzo Torresani
Tushar Nagarajan
41
0
0
17 Mar 2025
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
Md. Mohaiminul Islam
Tushar Nagarajan
Huiyu Wang
Gedas Bertasius
Lorenzo Torresani
50
0
0
12 Mar 2025
Towards Fine-Grained Video Question Answering
Wei Dai
Alan Luo
Zane Durante
Debadutta Dash
Arnold Milstein
Kevin Schulman
Ehsan Adeli
L. Fei-Fei
50
1
0
10 Mar 2025
UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces
Baining Zhao
Jianjie Fang
Zichao Dai
Z. Wang
Jirong Zha
...
Chen Gao
Y. Wang
Jinqiang Cui
Xinlei Chen
Y. Li
43
2
0
08 Mar 2025
Cross-modal Causal Relation Alignment for Video Question Grounding
Weixing Chen
Y. Liu
Binglin Chen
Jiandong Su
Yongsen Zheng
Liang Lin
BDL
VGen
CML
41
2
0
05 Mar 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
67
3
0
26 Feb 2025
A Video-grounded Dialogue Dataset and Metric for Event-driven Activities
Wiradee Imrattanatrai
Masaki Asada
Kimihiro Hasegawa
Zhi-Qi Cheng
Ken Fukuda
Teruko Mitamura
VGen
48
0
0
30 Jan 2025
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Yilun Zhao
Lujing Xie
Haowei Zhang
Guo Gan
Yitao Long
...
Xiangru Tang
Zhenwen Liang
Y. Liu
Chen Zhao
Arman Cohan
45
5
0
21 Jan 2025
TimeLogic: A Temporal Logic Benchmark for Video QA
TimeLogic: A Temporal Logic Benchmark for Video QA
S. Swetha
Hilde Kuehne
Mubarak Shah
37
1
0
13 Jan 2025
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
Hao Fei
Shengqiong Wu
Wei Ji
H. Zhang
M. Zhang
M. Lee
W. Hsu
LRM
VGen
39
55
0
08 Jan 2025
MLVU: Benchmarking Multi-task Long Video Understanding
MLVU: Benchmarking Multi-task Long Video Understanding
Junjie Zhou
Yan Shu
Bo Zhao
Boya Wu
Zhengyang Liang
...
Xi Yang
Y. Xiong
Bo Zhang
Tiejun Huang
Zheng Liu
VLM
45
33
0
03 Jan 2025
When SAM2 Meets Video Shadow and Mirror Detection
When SAM2 Meets Video Shadow and Mirror Detection
Leiping Jie
VLM
27
0
0
26 Dec 2024
VidCtx: Context-aware Video Question Answering with Image Models
VidCtx: Context-aware Video Question Answering with Image Models
Andreas Goulas
Vasileios Mezaris
Ioannis Patras
50
0
0
23 Dec 2024
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLM
VLM
102
1
0
20 Dec 2024
Do Language Models Understand Time?
Do Language Models Understand Time?
Xi Ding
Lei Wang
149
0
0
18 Dec 2024
Neptune: The Long Orbit to Benchmarking Long Video Understanding
Arsha Nagrani
Mingda Zhang
Ramin Mehran
Rachel Hornung
N. B. Gundavarapu
...
Boqing Gong
Cordelia Schmid
Mikhail Sirotenko
Yukun Zhu
Tobias Weyand
94
4
0
12 Dec 2024
Foundation Models and Adaptive Feature Selection: A Synergistic Approach
  to Video Question Answering
Foundation Models and Adaptive Feature Selection: A Synergistic Approach to Video Question Answering
Sai Bhargav Rongali
M. Cui
Ankit Jha
Neha Bhargava
Saurabh Prasad
Biplab Banerjee
67
0
0
12 Dec 2024
TimeRefine: Temporal Grounding with Time Refining Video LLM
TimeRefine: Temporal Grounding with Time Refining Video LLM
Xizi Wang
Feng Cheng
Ziyang Wang
Huiyu Wang
Md. Mohaiminul Islam
Lorenzo Torresani
Mohit Bansal
Gedas Bertasius
David J. Crandall
92
1
0
12 Dec 2024
PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
Meng Cao
Haoran Tang
Haoze Zhao
Hangyu Guo
J. H. Liu
Ge Zhang
Ruyang Liu
Qiang Sun
Ian Reid
Xiaodan Liang
93
2
0
02 Dec 2024
VideoSAVi: Self-Aligned Video Language Models without Human Supervision
VideoSAVi: Self-Aligned Video Language Models without Human Supervision
Yogesh Kulkarni
Pooyan Fazli
VLM
86
2
0
01 Dec 2024
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding
Yiming Zhang
Zhuokai Zhao
Zhaorun Chen
Zenghui Ding
Xianjun Yang
Yining Sun
87
1
0
21 Nov 2024
Learning to Reason Iteratively and Parallelly for Complex Visual
  Reasoning Scenarios
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios
Shantanu Jaiswal
Debaditya Roy
Basura Fernando
Cheston Tan
ReLM
LRM
60
2
0
20 Nov 2024
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Andong Deng
Tongjia Chen
Shoubin Yu
Taojiannan Yang
Lincoln Spencer
Yapeng Tian
Ajmal Saeed Mian
Mohit Bansal
Chen Chen
LRM
46
1
0
15 Nov 2024
VCBench: A Controllable Benchmark for Symbolic and Abstract Challenges
  in Video Cognition
VCBench: A Controllable Benchmark for Symbolic and Abstract Challenges in Video Cognition
Chenglin Li
Qianglong Chen
Zhi Li
Feng Tao
Yin Zhang
29
0
0
14 Nov 2024
HourVideo: 1-Hour Video-Language Understanding
HourVideo: 1-Hour Video-Language Understanding
Keshigeyan Chandrasegaran
Agrim Gupta
Lea M. Hadzic
Taran Kota
Jimming He
Cristobal Eyzaguirre
Zane Durante
Manling Li
Jiajun Wu
L. Fei-Fei
VLM
26
31
0
07 Nov 2024
Situational Scene Graph for Structured Human-centric Situation Understanding
Situational Scene Graph for Structured Human-centric Situation Understanding
Chinthani Sugandhika
Chen Li
Deepu Rajan
Basura Fernando
43
1
0
30 Oct 2024
Addressing Blind Guessing: Calibration of Selection Bias in
  Multiple-Choice Question Answering by Video Language Models
Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models
Olga Loginova
Oleksandr Bezrukov
Alexey Kravets
11
0
0
18 Oct 2024
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of
  MLLMs
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
Yunqiu Xu
Linchao Zhu
Yi Yang
11
3
0
16 Oct 2024
Mars: Situated Inductive Reasoning in an Open-World Environment
Mars: Situated Inductive Reasoning in an Open-World Environment
Xiaojuan Tang
Jiaqi Li
Yitao Liang
Song-chun Zhu
Muhan Zhang
Zilong Zheng
LM&Ro
LRM
LLMAG
13
1
0
10 Oct 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
42
25
0
10 Oct 2024
Frame-Voyager: Learning to Query Frames for Video Large Language Models
Frame-Voyager: Learning to Query Frames for Video Large Language Models
Sicheng Yu
Chengkai Jin
Huanyu Wang
Zhenghao Chen
Sheng Jin
...
Zhenbang Sun
Bingni Zhang
Jiawei Wu
Hao Zhang
Qianru Sun
44
5
0
04 Oct 2024
Video Instruction Tuning With Synthetic Data
Video Instruction Tuning With Synthetic Data
Yuanhan Zhang
Jinming Wu
Wei Li
Bo Li
Zejun Ma
Ziwei Liu
Chunyuan Li
SyDa
VGen
34
136
0
03 Oct 2024
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Haotian Zhang
Mingfei Gao
Zhe Gan
Philipp Dufter
Nina Wenzel
...
Haoxuan You
Zirui Wang
Afshin Dehghan
Peter Grasch
Yinfei Yang
VLM
MLLM
36
32
1
30 Sep 2024
123
Next