Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.16002
Cited By
AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning
30 March 2021
Madeleine Grunde-McLaughlin
Ranjay Krishna
Maneesh Agrawala
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning"
25 / 25 papers shown
Title
Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects
Agnese Chiatti
Sara Bernardini
Lara Shibelski Godoy Piccolo
Viola Schiaffonati
Matteo Matteucci
62
0
0
08 May 2025
DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes
S. Linok
Vadim Semenov
Anastasia Trunova
Oleg Bulichev
Dmitry A. Yudin
52
0
0
06 May 2025
TimeLogic: A Temporal Logic Benchmark for Video QA
S. Swetha
Hilde Kuehne
Mubarak Shah
47
1
0
13 Jan 2025
Neptune: The Long Orbit to Benchmarking Long Video Understanding
Arsha Nagrani
Mingda Zhang
Ramin Mehran
Rachel Hornung
N. B. Gundavarapu
...
Boqing Gong
Cordelia Schmid
Mikhail Sirotenko
Yukun Zhu
Tobias Weyand
103
4
0
12 Dec 2024
Progress-Aware Video Frame Captioning
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
100
1
0
03 Dec 2024
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios
Shantanu Jaiswal
Debaditya Roy
Basura Fernando
Cheston Tan
ReLM
LRM
79
2
0
20 Nov 2024
Situational Scene Graph for Structured Human-centric Situation Understanding
Chinthani Sugandhika
Chen Li
Deepu Rajan
Basura Fernando
158
1
0
30 Oct 2024
Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering
Zhaohe Liao
Jiangtong Li
Li Niu
Liqing Zhang
CoGe
37
3
0
03 Jul 2024
Encoding and Controlling Global Semantics for Long-form Video Question Answering
Thong Nguyen
Zhiyuan Hu
Xiaobao Wu
Cong-Duy Nguyen
See-Kiong Ng
A. Luu
43
2
0
30 May 2024
STAR: A Benchmark for Situated Reasoning in Real-World Videos
Bo Wu
Shoubin Yu
Zhenfang Chen
Joshua B Tenenbaum
Chuang Gan
38
176
0
15 May 2024
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min
Shyamal Buch
Arsha Nagrani
Minsu Cho
Cordelia Schmid
LRM
44
20
0
09 Apr 2024
VideoDistill: Language-aware Vision Distillation for Video Question Answering
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
VGen
47
1
0
01 Apr 2024
DAM: Dynamic Adapter Merging for Continual Video QA Learning
Feng Cheng
Ziyang Wang
Yi-Lin Sung
Yan-Bo Lin
Mohit Bansal
Gedas Bertasius
CLL
MoMe
31
10
0
13 Mar 2024
YTCommentQA: Video Question Answerability in Instructional Videos
Saelyne Yang
Sunghyun Park
Yunseok Jang
Moontae Lee
23
3
0
30 Jan 2024
CLEVRER-Humans: Describing Physical and Causal Events the Human Way
Jiayuan Mao
Xuelin Yang
Xikun Zhang
Noah D. Goodman
Jiajun Wu
NAI
22
22
0
05 Oct 2023
COLA: A Benchmark for Compositional Text-to-image Retrieval
Arijit Ray
Filip Radenovic
Abhimanyu Dubey
Bryan A. Plummer
Ranjay Krishna
Kate Saenko
CoGe
VLM
41
34
0
05 May 2023
Personality-aware Human-centric Multimodal Reasoning: A New Task, Dataset and Baselines
Yaochen Zhu
Xiangqing Shen
Rui Xia
19
5
0
05 Apr 2023
Dense but Efficient VideoQA for Intricate Compositional Reasoning
Jihyeon Janel Lee
Wooyoung Kang
Eun-Sol Kim
CoGe
16
3
0
19 Oct 2022
Locate before Answering: Answer Guided Question Localization for Video Question Answering
Tianwen Qian
Ran Cui
Jingjing Chen
Pai Peng
Xiao-Wei Guo
Yu-Gang Jiang
29
17
0
05 Oct 2022
AGQA 2.0: An Updated Benchmark for Compositional Spatio-Temporal Reasoning
Madeleine Grunde-McLaughlin
Ranjay Krishna
Maneesh Agrawala
CoGe
28
14
0
12 Apr 2022
Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning
Juncheng Li
Junlin Xie
Long Qian
Linchao Zhu
Siliang Tang
Fei Wu
Yi Yang
Yueting Zhuang
X. Wang
36
73
0
24 Mar 2022
Video Question Answering: Datasets, Algorithms and Challenges
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
24
85
0
02 Mar 2022
Counterfactual Samples Synthesizing for Robust Visual Question Answering
Long Chen
Xin Yan
Jun Xiao
Hanwang Zhang
Shiliang Pu
Yueting Zhuang
OOD
AAML
154
290
0
14 Mar 2020
Image Generation from Scene Graphs
Justin Johnson
Agrim Gupta
Li Fei-Fei
GNN
223
815
0
04 Apr 2018
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Chelsea Finn
Pieter Abbeel
Sergey Levine
OOD
329
11,684
0
09 Mar 2017
1