Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2011.07735
Cited By
iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering
16 November 2020
Vasu Sharma
Gurneet Arora
Navpreet Kaloty
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering"
22 / 22 papers shown
PR-DETR: Injecting Position and Relation Prior for Dense Video Captioning
Yizhe Li
Sanping Zhou
Zheng Qin
Le Wang
ViT
258
0
0
19 Jun 2025
FocusedAD: Character-centric Movie Audio Description
Xiaojun Ye
C. Wang
Yiren Song
Sheng Zhou
Liangcheng Li
Jiajun Bu
VGen
457
5
0
16 Apr 2025
StoryNavi: On-Demand Narrative-Driven Reconstruction of Video Play With Generative AI
Alston Lantian Xu
Tianwei Ma
Tianmeng Liu
Can Liu
Alvaro Cassinelli
VGen
202
0
0
04 Oct 2024
AutoAD III: The Prequel -- Back to the Pixels
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
445
40
0
22 Apr 2024
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
IEEE International Conference on Computer Vision (ICCV), 2023
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
318
54
0
10 Oct 2023
Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention
British Machine Vision Conference (BMVC), 2023
Burak Satar
Huaiyu Zhu
Hanwang Zhang
Joo-Hwee Lim
CML
250
0
0
17 Sep 2023
Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer
IEEE International Conference on Computer Vision (ICCV), 2023
Guangyi Chen
Xiao Liu
Guangrun Wang
Kun Zhang
Philip H.S.Torr
Xiaoping Zhang
Yansong Tang
396
32
0
16 Aug 2023
A Review of Deep Learning for Video Captioning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Xiaoshi Zhong
Fatih Porikli
3DV
273
48
0
22 Apr 2023
AutoAD: Movie Description in Context
Computer Vision and Pattern Recognition (CVPR), 2023
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
300
57
0
29 Mar 2023
Implicit and Explicit Commonsense for Multi-sentence Video Captioning
Computer Vision and Image Understanding (CVIU), 2023
Shih-Han Chou
James J. Little
Leonid Sigal
222
5
0
14 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Computer Vision and Pattern Recognition (CVPR), 2023
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
586
358
0
27 Feb 2023
Video Question Answering with Iterative Video-Text Co-Tokenization
European Conference on Computer Vision (ECCV), 2022
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
291
21
0
01 Aug 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Neural Information Processing Systems (NeurIPS), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
566
285
0
16 Jun 2022
Learning to Answer Visual Questions from Web Videos
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
428
41
0
10 May 2022
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
European Conference on Computer Vision (ECCV), 2022
B. Wong
Joya Chen
You Wu
Stan Weixian Lei
Dongxing Mao
Difei Gao
Mike Zheng Shou
EgoV
561
36
0
08 Mar 2022
Video Question Answering: Datasets, Algorithms and Challenges
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
360
119
0
02 Mar 2022
Bridging Video-text Retrieval with Multiple Choice Questions
Computer Vision and Pattern Recognition (CVPR), 2022
Yuying Ge
Yixiao Ge
Xihui Liu
Dian Li
Ying Shan
Xiaohu Qie
Ping Luo
BDL
380
126
0
13 Jan 2022
Dense Video Captioning Using Unsupervised Semantic Information
Valter Estevam
Rayson Laroca
Hélio Pedrini
David Menotti
295
11
0
15 Dec 2021
Transferring Domain-Agnostic Knowledge in Video Question Answering
Tianran Wu
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
Haruo Takemura
154
10
0
26 Oct 2021
iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability
Andrew Wang
Vasu Sharma
CML
268
5
0
25 Jun 2021
On the hidden treasure of dialog in video question answering
IEEE International Conference on Computer Vision (ICCV), 2021
Deniz Engin
Franccois Schnitzler
Ngoc Q. K. Duong
Yannis Avrithis
268
12
0
26 Mar 2021
Open-Ended Multi-Modal Relational Reasoning for Video Question Answering
IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2020
Haozheng Luo
Ruiyang Qin
Chenwei Xu
Guo Ye
Zening Luo
597
6
0
01 Dec 2020
1
Page 1 of 1