Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.10698
Cited By
v1
v2
v3 (latest)
Hierarchical Conditional Relation Networks for Video Question Answering
Computer Vision and Pattern Recognition (CVPR), 2020
25 February 2020
T. Le
Vuong Le
Svetha Venkatesh
T. Tran
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Hierarchical Conditional Relation Networks for Video Question Answering"
50 / 161 papers shown
GHR-VQA: Graph-guided Hierarchical Relational Reasoning for Video Question Answering
Dionysia Danai Brilli
Dimitrios Mallis
Vassilis Pitsikalis
Petros Maragos
216
0
0
25 Nov 2025
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
Z. Liang
D. Zhang
Huichi Zhou
Rui Huang
Bobo Li
...
Shengqiong Wu
X. Wang
Jiebo Luo
Lizi Liao
Hao Fei
VGen
245
1
0
11 Nov 2025
SRNN: Spatiotemporal Relational Neural Network for Intuitive Physics Understanding
Fei Yang
203
0
0
10 Nov 2025
Semantic Frame Aggregation-based Transformer for Live Video Comment Generation
IEEE transactions on multimedia (TMM), 2025
Anam Fatima
Yi Yu
Janak Kapuriya
Julien Lalanne
Jainendra Shukla
221
0
0
30 Oct 2025
AV-Master: Dual-Path Comprehensive Perception Makes Better Audio-Visual Question Answering
Jiayu Zhang
Qilang Ye
Shuo Ye
Xun Lin
Zihan Song
Zitong Yu
167
0
0
21 Oct 2025
RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba
Kunyu Peng
Di Wen
Jia Fu
Jiamin Wu
Kailun Yang
...
Yufan Chen
Yuqian Fu
D. Paudel
Luc Van Gool
Rainer Stiefelhagen
174
0
0
18 Oct 2025
Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey
Jinxuan Li
Chaolei Tan
Haoxuan Chen
Jianxin Ma
Jian-Fang Hu
Wei-Shi Zheng
Jianhuang Lai
VLM
250
1
0
12 Oct 2025
Traffic-MLLM: Curiosity-Regularized Supervised Learning for Traffic Scenario Case-Based Reasoning
Waikit Xiu
Qiang Lu
Xiying Li
Chen Hu
Shengbo Sun
LRM
107
1
0
14 Sep 2025
ChainReaction: Causal Chain-Guided Reasoning for Modular and Explainable Causal-Why Video Question Answering
Paritosh Parmar
Eric Peh
Basura Fernando
VGen
LRM
218
0
0
28 Aug 2025
Leveraging Static Relationships for Intra-Type and Inter-Type Message Passing in Video Question Answering
Lili Liang
Guanglu Sun
328
1
0
03 Apr 2025
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning
Jie Ma
Zhitao Gao
Qi Chai
Jing Liu
Peijie Wang
Jing Tao
Zhou Su
442
6
0
01 Apr 2025
Question-Aware Gaussian Experts for Audio-Visual Question Answering
Computer Vision and Pattern Recognition (CVPR), 2025
Hongyeob Kim
Inyoung Jung
Dayoon Suh
Youjia Zhang
Sangmin Lee
Sungeun Hong
510
10
0
06 Mar 2025
EgoLife: Towards Egocentric Life Assistant
Computer Vision and Pattern Recognition (CVPR), 2025
Jingkang Yang
Shuai Liu
Hongming Guo
Yuhao Dong
Xinyu Zhang
...
Joerg Widmer
Francesco Gringoli
Lei Yang
Bo Li
Ziwei Liu
EgoV
337
12
0
05 Mar 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
405
0
0
11 Feb 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Peng Jin
Haoyang Li
Li Yuan
Shuicheng Yan
Jie Chen
484
3
0
31 Dec 2024
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios
Neural Information Processing Systems (NeurIPS), 2024
Shantanu Jaiswal
Debaditya Roy
Basura Fernando
Cheston Tan
ReLM
LRM
422
7
0
20 Nov 2024
Grounded Video Caption Generation
Evangelos Kazakos
Cordelia Schmid
Josef Sivic
327
0
0
12 Nov 2024
SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Tianyu Yang
Yiyang Nan
Lisen Dai
Zhenwen Liang
Yapeng Tian
Wei Wei
402
2
0
07 Nov 2024
Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering
Ting Yu
Kunhao Fu
Shuhui Wang
Qingming Huang
Jun Yu
340
10
0
12 Oct 2024
Multi-granularity Contrastive Cross-modal Collaborative Generation for End-to-End Long-term Video Question Answering
IEEE Transactions on Image Processing (TIP), 2024
Ting Yu
Kunhao Fu
Jian Zhang
Qingming Huang
Jun Yu
273
11
0
12 Oct 2024
VideoQA in the Era of LLMs: An Empirical Study
International Journal of Computer Vision (IJCV), 2024
Junbin Xiao
Nanxin Huang
Hangyu Qin
Dongyang Li
Yicong Li
...
Zhulin Tao
Jianxing Yu
Liang Lin
Tat-Seng Chua
Angela Yao
418
29
0
08 Aug 2024
SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses
ACM Multimedia (MM), 2024
Chaolei Tan
Zihang Lin
Junfu Pu
Chen Ma
Wei-Yi Pei
Zhi Qu
Yexin Wang
Ying Shan
Wei-Shi Zheng
Jianfang Hu
AI4TS
477
3
0
03 Aug 2024
Compositional Physical Reasoning of Objects and Events from Videos
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Zhenfang Chen
Shilong Dong
Kexin Yi
Yunzhu Li
Mingyu Ding
Antonio Torralba
Joshua B. Tenenbaum
Chuang Gan
OCL
464
10
0
02 Aug 2024
Boosting Audio Visual Question Answering via Key Semantic-Aware Cues
Guangyao Li
Henghui Du
Di Hu
265
19
0
30 Jul 2024
Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering
Zhaohe Liao
Jiangtong Li
Li Niu
Liqing Zhang
CoGe
247
15
0
03 Jul 2024
Multi-Modal Video Dialog State Tracking in the Wild
Adnen Abdessaied
Lei Shi
Andreas Bulling
450
4
0
02 Jul 2024
SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering
IEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024
Zhe Yang
Wenrui Li
Guanghui Cheng
Mamba
307
6
0
14 Jun 2024
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Thong Nguyen
Yi Bin
Junbin Xiao
Leigang Qu
Yicong Li
Jay Zhangjie Wu
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
648
39
1
09 Jun 2024
STAR: A Benchmark for Situated Reasoning in Real-World Videos
Bo Wu
Shoubin Yu
Zhenfang Chen
Joshua B. Tenenbaum
Chuang Gan
590
279
0
15 May 2024
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
Jie Ma
Min Hu
Pinghui Wang
Wangchun Sun
Lingyun Song
Hongbin Pei
Jun Liu
Youtian Du
669
22
0
18 Apr 2024
Neural-Symbolic VideoQA: Learning Compositional Spatio-Temporal Reasoning for Real-world Video Question Answering
Lili Liang
Guanglu Sun
Jin Qiu
Lizhong Zhang
NAI
240
6
0
05 Apr 2024
CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes
Paritosh Parmar
Eric Peh
Ruirui Chen
Ting En Lam
Yuhan Chen
Elston Tan
Basura Fernando
CML
425
13
0
01 Apr 2024
VideoDistill: Language-aware Vision Distillation for Video Question Answering
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
VGen
271
5
0
01 Apr 2024
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLM
VGen
344
37
0
26 Mar 2024
Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels
Tianming Liang
Chaolei Tan
Beihao Xia
Wei-Shi Zheng
Jianfang Hu
334
2
0
21 Mar 2024
Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
Computer Vision and Pattern Recognition (CVPR), 2024
Chaolei Tan
Jian-Huang Lai
Wei-Shi Zheng
Jianfang Hu
AI4TS
421
10
0
18 Mar 2024
Answering Diverse Questions via Text Attached with Key Audio-Visual Clues
Qilang Ye
Zitong Yu
Xin Liu
275
4
0
11 Mar 2024
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
European Conference on Computer Vision (ECCV), 2024
Qilang Ye
Zitong Yu
Rui Shao
Xinyu Xie
Juil Sock
Simeng Qin
MLLM
489
54
0
07 Mar 2024
Abductive Ego-View Accident Video Understanding for Safe Driving Perception
Jianwu Fang
Lei-lei Li
Junfei Zhou
Junbin Xiao
Hongkai Yu
Chen Lv
Jianru Xue
Tat-Seng Chua
346
50
0
01 Mar 2024
LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding
Yuxuan Wang
Yueqian Wang
Pengfei Wu
Jianxin Liang
Dongyan Zhao
Zilong Zheng
VLM
301
3
0
25 Feb 2024
M2K-VDG: Model-Adaptive Multimodal Knowledge Anchor Enhanced Video-grounded Dialogue Generation
Hongcheng Liu
Pingjie Wang
Yu Wang
Yanfeng Wang
357
4
0
19 Feb 2024
ContPhy: Continuum Physical Concept Learning and Reasoning from Videos
Zhicheng Zheng
Xin Yan
Zhenfang Chen
Jingzhou Wang
Qin Zhi Eddie Lim
Joshua B. Tenenbaum
Chuang Gan
LRM
243
19
0
09 Feb 2024
SNP-S3: Shared Network Pre-training and Significant Semantic Strengthening for Various Video-Text Tasks
Xingning Dong
Qingpei Guo
Tian Gan
Qing Wang
Yue Yu
Xiangyuan Ren
Yuan Cheng
Wei Chu
257
6
0
31 Jan 2024
CaRiNG: Learning Temporal Causal Representation under Non-Invertible Generation Process
International Conference on Machine Learning (ICML), 2024
Guan-Hong Chen
Yifan Shen
Zhenhao Chen
Xiangchen Song
Yuewen Sun
Weiran Yao
Xiao Liu
Kun Zhang
CML
358
20
0
25 Jan 2024
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering
Haibo Wang
Chenghang Lai
Yixuan Sun
Weifeng Ge
468
17
0
19 Jan 2024
STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering
AAAI Conference on Artificial Intelligence (AAAI), 2024
Yueqian Wang
Yuxuan Wang
Kai Chen
Dongyan Zhao
245
2
0
08 Jan 2024
Context-Guided Spatio-Temporal Video Grounding
Computer Vision and Pattern Recognition (CVPR), 2024
Xin Gu
Hengrui Fan
Yan Huang
Tiejian Luo
Libo Zhang
387
43
0
03 Jan 2024
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering
Neural Information Processing Systems (NeurIPS), 2024
Ziyi Bai
Ruiping Wang
Xilin Chen
394
15
0
03 Jan 2024
Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering
Haopeng Li
Qiuhong Ke
Mingming Gong
Tom Drummond
Qiuhong Ke
333
2
0
03 Jan 2024
Cross-Modal Reasoning with Event Correlation for Video Question Answering
Chengxiang Yin
Zhengping Che
Kun Wu
Zhiyuan Xu
Qinru Qiu
Jian Tang
232
0
0
20 Dec 2023
1
2
3
4
Next
Page 1 of 4