ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.07735
  4. Cited By
iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video
  Captioning and Video Question Answering

iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering

16 November 2020
Vasu Sharma
Gurneet Arora
Navpreet Kaloty
ArXiv (abs)PDFHTML

Papers citing "iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering"

22 / 22 papers shown
PR-DETR: Injecting Position and Relation Prior for Dense Video Captioning
PR-DETR: Injecting Position and Relation Prior for Dense Video Captioning
Yizhe Li
Sanping Zhou
Zheng Qin
Le Wang
ViT
258
0
0
19 Jun 2025
FocusedAD: Character-centric Movie Audio Description
FocusedAD: Character-centric Movie Audio Description
Xiaojun Ye
C. Wang
Yiren Song
Sheng Zhou
Liangcheng Li
Jiajun Bu
VGen
457
5
0
16 Apr 2025
StoryNavi: On-Demand Narrative-Driven Reconstruction of Video Play With
  Generative AI
StoryNavi: On-Demand Narrative-Driven Reconstruction of Video Play With Generative AI
Alston Lantian Xu
Tianwei Ma
Tianmeng Liu
Can Liu
Alvaro Cassinelli
VGen
202
0
0
04 Oct 2024
AutoAD III: The Prequel -- Back to the Pixels
AutoAD III: The Prequel -- Back to the Pixels
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGenDiffM
445
40
0
22 Apr 2024
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
AutoAD II: The Sequel -- Who, When, and What in Movie Audio DescriptionIEEE International Conference on Computer Vision (ICCV), 2023
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGenDiffM
318
54
0
10 Oct 2023
Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal
  Intervention
Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal InterventionBritish Machine Vision Conference (BMVC), 2023
Burak Satar
Huaiyu Zhu
Hanwang Zhang
Joo-Hwee Lim
CML
250
0
0
17 Sep 2023
Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer
Tem-adapter: Adapting Image-Text Pretraining for Video Question AnswerIEEE International Conference on Computer Vision (ICCV), 2023
Guangyi Chen
Xiao Liu
Guangrun Wang
Kun Zhang
Philip H.S.Torr
Xiaoping Zhang
Yansong Tang
396
32
0
16 Aug 2023
A Review of Deep Learning for Video Captioning
A Review of Deep Learning for Video CaptioningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Xiaoshi Zhong
Fatih Porikli
3DV
273
48
0
22 Apr 2023
AutoAD: Movie Description in Context
AutoAD: Movie Description in ContextComputer Vision and Pattern Recognition (CVPR), 2023
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
300
57
0
29 Mar 2023
Implicit and Explicit Commonsense for Multi-sentence Video Captioning
Implicit and Explicit Commonsense for Multi-sentence Video CaptioningComputer Vision and Image Understanding (CVIU), 2023
Shih-Han Chou
James J. Little
Leonid Sigal
222
5
0
14 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense
  Video Captioning
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2023
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TSVLM
586
358
0
27 Feb 2023
Video Question Answering with Iterative Video-Text Co-Tokenization
Video Question Answering with Iterative Video-Text Co-TokenizationEuropean Conference on Computer Vision (ECCV), 2022
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
291
21
0
01 Aug 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language
  Models
Zero-Shot Video Question Answering via Frozen Bidirectional Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
566
285
0
16 Jun 2022
Learning to Answer Visual Questions from Web Videos
Learning to Answer Visual Questions from Web VideosIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
428
41
0
10 May 2022
AssistQ: Affordance-centric Question-driven Task Completion for
  Egocentric Assistant
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric AssistantEuropean Conference on Computer Vision (ECCV), 2022
B. Wong
Joya Chen
You Wu
Stan Weixian Lei
Dongxing Mao
Difei Gao
Mike Zheng Shou
EgoV
561
36
0
08 Mar 2022
Video Question Answering: Datasets, Algorithms and Challenges
Video Question Answering: Datasets, Algorithms and ChallengesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
360
119
0
02 Mar 2022
Bridging Video-text Retrieval with Multiple Choice Questions
Bridging Video-text Retrieval with Multiple Choice QuestionsComputer Vision and Pattern Recognition (CVPR), 2022
Yuying Ge
Yixiao Ge
Xihui Liu
Dian Li
Ying Shan
Xiaohu Qie
Ping Luo
BDL
380
126
0
13 Jan 2022
Dense Video Captioning Using Unsupervised Semantic Information
Dense Video Captioning Using Unsupervised Semantic Information
Valter Estevam
Rayson Laroca
Hélio Pedrini
David Menotti
295
11
0
15 Dec 2021
Transferring Domain-Agnostic Knowledge in Video Question Answering
Transferring Domain-Agnostic Knowledge in Video Question Answering
Tianran Wu
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
Haruo Takemura
154
10
0
26 Oct 2021
iReason: Multimodal Commonsense Reasoning using Videos and Natural
  Language with Interpretability
iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability
Andrew Wang
Vasu Sharma
CML
268
5
0
25 Jun 2021
On the hidden treasure of dialog in video question answering
On the hidden treasure of dialog in video question answeringIEEE International Conference on Computer Vision (ICCV), 2021
Deniz Engin
Franccois Schnitzler
Ngoc Q. K. Duong
Yannis Avrithis
268
12
0
26 Mar 2021
Open-Ended Multi-Modal Relational Reasoning for Video Question Answering
Open-Ended Multi-Modal Relational Reasoning for Video Question AnsweringIEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2020
Haozheng Luo
Ruiyang Qin
Chenwei Xu
Guo Ye
Zening Luo
597
6
0
01 Dec 2020
1
Page 1 of 1