Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1611.04021
Cited By
v1
v2 (latest)
Leveraging Video Descriptions to Learn Video Question Answering
12 November 2016
Kuo-Hao Zeng
Tseng-Hung Chen
Ching-Yao Chuang
Yuan-Hong Liao
Juan Carlos Niebles
Min Sun
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Leveraging Video Descriptions to Learn Video Question Answering"
50 / 84 papers shown
TextVidBench: A Benchmark for Long Video Scene Text Understanding
Yangyang Zhong
Ji Qi
Yuan Yao
Pengxin Luo
Yunfeng Yan
Donglian Qi
Zhiyuan Liu
Tat-Seng Chua
346
0
0
05 Jun 2025
An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes
Ji Qi
Yuan Yao
Yushi Bai
Bin Xu
Juanzi Li
Zhiyuan Liu
Tat-Seng Chua
313
5
0
21 Apr 2025
Natural Language Generation from Visual Events: State-of-the-Art and Key Open Questions
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
EGVM
1.1K
0
0
18 Feb 2025
Progress-Aware Video Frame Captioning
Computer Vision and Pattern Recognition (CVPR), 2024
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
687
7
0
03 Dec 2024
Grounded Video Caption Generation
Evangelos Kazakos
Cordelia Schmid
Josef Sivic
296
0
0
12 Nov 2024
Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering
Ting Yu
Kunhao Fu
Shuhui Wang
Qingming Huang
Jun Yu
328
10
0
12 Oct 2024
Multi-granularity Contrastive Cross-modal Collaborative Generation for End-to-End Long-term Video Question Answering
IEEE Transactions on Image Processing (TIP), 2024
Ting Yu
Kunhao Fu
Jian Zhang
Qingming Huang
Jun Yu
266
10
0
12 Oct 2024
Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
Hung-Ting Su
Chun-Tong Chao
Ya-Ching Hsu
Xudong Lin
Yulei Niu
Hung-Yi Lee
Winston H. Hsu
LRM
250
1
0
16 Jun 2024
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
Jongwoo Park
Kanchana Ranasinghe
Kumara Kahatapitiya
Wonjeong Ryoo
Donghyun Kim
Michael S. Ryoo
439
66
0
13 Jun 2024
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Xuehai He
Weixi Feng
Kaizhi Zheng
Yujie Lu
Wanrong Zhu
...
Zhengyuan Yang
Kevin Lin
William Yang Wang
Lijuan Wang
Xin Eric Wang
VGen
LRM
792
37
0
12 Jun 2024
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Thong Nguyen
Yi Bin
Junbin Xiao
Leigang Qu
Yicong Li
Jay Zhangjie Wu
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
633
34
1
09 Jun 2024
CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes
Paritosh Parmar
Eric Peh
Ruirui Chen
Ting En Lam
Yuhan Chen
Elston Tan
Basura Fernando
CML
382
13
0
01 Apr 2024
Cross-Modal Reasoning with Event Correlation for Video Question Answering
Chengxiang Yin
Zhengping Che
Kun Wu
Zhiyuan Xu
Qinru Qiu
Jian Tang
210
0
0
20 Dec 2023
Long Story Short: a Summarize-then-Search Method for Long Video Question Answering
Jiwan Chung
Youngjae Yu
451
7
0
02 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Information Fusion (Inf. Fusion), 2023
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
429
79
0
01 Nov 2023
Learning to Summarize and Answer Questions about a Virtual Robot's Past Actions
Autonomous Robots (Auton. Robots), 2023
Chad DeChant
Iretiayo Akinola
Daniel Bauer
240
13
0
16 Jun 2023
Let's Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Vaishnavi Himakunthala
Andy Ouyang
Daniel Philip Rose
Ryan He
Alex Mei
Yujie Lu
Chinmay Sonar
Michael Stephen Saxon
William Y. Wang
MLLM
LRM
348
2
0
23 May 2023
VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation
Xilun Chen
L. Yu
Wenhan Xiong
Barlas Ouguz
Yashar Mehdad
Anuj Kumar
VGen
196
4
0
04 May 2023
ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos
Computer Vision and Pattern Recognition (CVPR), 2023
Zhou Yu
Lixiang Zheng
Zhou Zhao
A. Fedoseev
Jianping Fan
Kui Ren
Jun Yu
CoGe
361
23
0
04 May 2023
A Review of Deep Learning for Video Captioning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Xiaoshi Zhong
Fatih Porikli
3DV
250
46
0
22 Apr 2023
Learning Situation Hyper-Graphs for Video Question Answering
Computer Vision and Pattern Recognition (CVPR), 2023
Aisha Urooj Khan
Hilde Kuehne
Bo Wu
Kim Chheu
Walid Bousselham
Chuang Gan
N. Lobo
M. Shah
272
23
0
18 Apr 2023
Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Hung-Ting Su
Yulei Niu
Xudong Lin
Winston H. Hsu
Shih-Fu Chang
VGen
ELM
319
13
0
07 Apr 2023
Connecting Vision and Language with Video Localized Narratives
Computer Vision and Pattern Recognition (CVPR), 2023
P. Voigtlaender
Soravit Changpinyo
Jordi Pont-Tuset
Radu Soricut
V. Ferrari
VGen
397
31
0
22 Feb 2023
Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation
Computer Vision and Pattern Recognition (CVPR), 2023
Razvan-George Pasca
Alexey Gavryushin
Muhammad Hamza
Yen-Ling Kuo
Kaichun Mo
Luc Van Gool
Otmar Hilliges
Xi Wang
572
23
0
22 Jan 2023
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling
British Machine Vision Conference (BMVC), 2022
Hsin-Ying Lee
Hung-Ting Su
Bing-Chen Tsai
Tsung-Han Wu
Jia-Fong Yeh
Winston H. Hsu
372
2
0
08 Oct 2022
EgoTaskQA: Understanding Human Tasks in Egocentric Videos
Neural Information Processing Systems (NeurIPS), 2022
Baoxiong Jia
Ting Lei
Song-Chun Zhu
Siyuan Huang
EgoV
274
107
0
08 Oct 2022
M^4I: Multi-modal Models Membership Inference
Neural Information Processing Systems (NeurIPS), 2022
Pingyi Hu
Zihan Wang
Ruoxi Sun
Hu Wang
Minhui Xue
241
38
0
15 Sep 2022
WildQA: In-the-Wild Video Question Answering
International Conference on Computational Linguistics (COLING), 2022
Santiago Castro
Naihao Deng
Pingxuan Huang
Mihai Burzo
Amélie Reymond
359
9
0
14 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
ACM Computing Surveys (ACM CSUR), 2022
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
347
200
0
07 Sep 2022
Equivariant and Invariant Grounding for Video Question Answering
ACM Multimedia (ACM MM), 2022
Yicong Li
Xiang Wang
Junbin Xiao
Tat-Seng Chua
228
36
0
26 Jul 2022
Invariant Grounding for Video Question Answering
Computer Vision and Pattern Recognition (CVPR), 2022
Yicong Li
Xiang Wang
Junbin Xiao
Wei Ji
Tat-Seng Chua
OOD
245
116
0
06 Jun 2022
Learning to Retrieve Videos by Asking Questions
ACM Multimedia (ACM MM), 2022
Avinash Madasu
Junier Oliva
Gedas Bertasius
VGen
347
19
0
11 May 2022
Learning to Answer Visual Questions from Web Videos
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
408
40
0
10 May 2022
3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos
Computer Vision and Pattern Recognition (CVPR), 2022
Vikram Gupta
Trisha Mittal
Puneet Mathur
Vaibhav Mishra
Mayank Maheshwari
Aniket Bera
Debdoot Mukherjee
Tianyi Zhou
VGen
297
14
0
28 Mar 2022
Video Question Answering: Datasets, Algorithms and Challenges
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
358
118
0
02 Mar 2022
NEWSKVQA: Knowledge-Aware News Video Question Answering
Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2022
Pranay Gupta
Manish Gupta
305
9
0
08 Feb 2022
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering
Junbin Xiao
Angela Yao
Zhiyuan Liu
Yicong Li
Wei Ji
Tat-Seng Chua
391
140
0
12 Dec 2021
Question Answering Survey: Directions, Challenges, Datasets, Evaluation Matrices
Hariom A. Pandya
Brijesh S. Bhatt
213
34
0
07 Dec 2021
Simple Dialogue System with AUDITED
British Machine Vision Conference (BMVC), 2021
Eugenio Clerico
Piotr Koniusz
218
2
0
22 Oct 2021
Pano-AVQA: Grounded Audio-Visual Question Answering on 360
∘
^\circ
∘
Videos
IEEE International Conference on Computer Vision (ICCV), 2021
Heeseung Yun
Youngjae Yu
Wonsuk Yang
Kangil Lee
Gunhee Kim
326
121
0
11 Oct 2021
TrUMAn: Trope Understanding in Movies and Animations
International Conference on Information and Knowledge Management (CIKM), 2021
Hung-Ting Su
Po-Wei Shen
Bing-Chen Tsai
Wen-Feng Cheng
Ke-Jyun Wang
Winston H. Hsu
193
6
0
10 Aug 2021
iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability
Andrew Wang
Vasu Sharma
CML
244
5
0
25 Jun 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Computer Vision and Pattern Recognition (CVPR), 2021
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
490
776
0
18 May 2021
Relation-aware Hierarchical Attention Framework for Video Question Answering
International Conference on Multimedia Retrieval (ICMR), 2021
Fangtao Li
Ting Bai
Chenyu Cao
Zihe Liu
C. Yan
Bin Wu
266
14
0
13 May 2021
Video Question Answering with Phrases via Semantic Roles
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Arka Sadhu
Kan Chen
Ram Nevatia
203
16
0
08 Apr 2021
Visual Semantic Role Labeling for Video Understanding
Computer Vision and Pattern Recognition (CVPR), 2021
Arka Sadhu
Tanmay Gupta
Mark Yatskar
Ram Nevatia
Aniruddha Kembhavi
VLM
426
91
0
02 Apr 2021
AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning
Computer Vision and Pattern Recognition (CVPR), 2021
Madeleine Grunde-McLaughlin
Ranjay Krishna
Maneesh Agrawala
CoGe
318
151
0
30 Mar 2021
SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
Computer Vision and Pattern Recognition (CVPR), 2021
Kepeng Xu
He Huang
Jun Liu
ViT
LRM
538
116
0
29 Mar 2021
On Semantic Similarity in Video Retrieval
Computer Vision and Pattern Recognition (CVPR), 2021
Michael Wray
Hazel Doughty
Dima Damen
297
78
0
18 Mar 2021
Narration Generation for Cartoon Videos
Nikos Papasarantopoulos
Shay B. Cohen
VGen
226
2
0
17 Jan 2021
1
2
Next
Page 1 of 2