Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.12058
Cited By
Discovering Spatio-Temporal Rationales for Video Question Answering
22 July 2023
Yicong Li
Junbin Xiao
Chun Feng
Xiang Wang
Tat-Seng Chua
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Discovering Spatio-Temporal Rationales for Video Question Answering"
14 / 14 papers shown
Title
Video Flow as Time Series: Discovering Temporal Consistency and Variability for VideoQA
Zijie Song
Zhenzhen Hu
Yixiao Ma
Jia Li
Richang Hong
16
0
0
08 Apr 2025
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation
Zhiyuan Liu
Yanchen Luo
Han Huang
Enzhi Zhang
Sihang Li
Junfeng Fang
Yaorui Shi
X. Wang
Kenji Kawaguchi
Tat-Seng Chua
100
3
0
18 Feb 2025
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo
Min-Hung Chen
De-An Huang
Sifei Liu
Subhashree Radhakrishnan
Seon Joo Kim
Yu-Chun Wang
Ryo Hachiuma
ObjD
VLM
116
2
0
14 Jan 2025
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
Hao Fei
Shengqiong Wu
Wei Ji
H. Zhang
M. Zhang
M. Lee
W. Hsu
LRM
VGen
44
55
0
08 Jan 2025
When SAM2 Meets Video Shadow and Mirror Detection
Leiping Jie
VLM
27
1
0
26 Dec 2024
Scene-Text Grounding for Text-Based Video Question Answering
Sheng Zhou
Junbin Xiao
Xun Yang
Peipei Song
Dan Guo
Angela Yao
Meng Wang
Tat-Seng Chua
52
1
0
22 Sep 2024
High-Order Evolving Graphs for Enhanced Representation of Traffic Dynamics
Aditya Humnabadkar
Arindam Sikdar
Benjamin Cave
Huaizhong Zhang
P. Bakaki
Ardhendu Behera
14
0
0
17 Sep 2024
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering
Haibo Wang
Chenghang Lai
Yixuan Sun
Weifeng Ge
13
5
0
19 Jan 2024
Can I Trust Your Answer? Visually Grounded Video Question Answering
Junbin Xiao
Angela Yao
Yicong Li
Tat-Seng Chua
25
46
0
04 Sep 2023
Video Graph Transformer for Video Question Answering
Junbin Xiao
Pan Zhou
Tat-Seng Chua
Shuicheng Yan
ViT
134
73
0
12 Jul 2022
Discovering Invariant Rationales for Graph Neural Networks
Yingmin Wu
Xiang Wang
An Zhang
Xiangnan He
Tat-Seng Chua
OOD
AI4CE
89
222
0
30 Jan 2022
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
245
554
0
28 Sep 2021
Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering
Jungin Park
Jiyoung Lee
K. Sohn
123
99
0
29 Apr 2021
Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions
Radhika Dua
Sai Srinivas Kancheti
V. Balasubramanian
LRM
30
22
0
24 Oct 2020
1