v1v2 (latest)

DramaQA: Character-Centered Video Story Understanding with Hierarchical QA

7 May 2020

Papers citing "DramaQA: Character-Centered Video Story Understanding with Hierarchical QA"

38 / 38 papers shown

Compression then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding

337

11 Nov 2025

Prompt-Driven Agentic Video Editing System: Autonomous Comprehension of Long-Form, Story-Driven Media

299

20 Sep 2025

Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding

Gowreesh Mago

Pascal Mettes

Stevan Rudinac

191

28 Aug 2025

Robust Symbolic Reasoning for Visual Narratives via Hierarchical and Semantically Normalized Knowledge Graphs

Yi-Chun Chen

NAI

135

20 Aug 2025

Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation

532

10 May 2025

ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models

426

25 Mar 2025

DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question AnsweringIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

Han Wang

Kai Hu

Liangcai Gao

688

20 Mar 2025

Empowering Large Language Model for Continual Video Question Answering with Collaborative PromptingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

437

20 Jan 2025

DragonVerseQA: Open-Domain Long-Form Context-Aware Question-Answering

A. Lahiri

Qinmin Vivian Hu

255

21 Dec 2024

QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems

345

14 Sep 2024

Question-Answering Dense Video EventsAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2024

704

06 Sep 2024

Multilingual Synopses of Movie Narratives: A Dataset for Story UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Yidan Sun

Jianfei Yu

Boyang Li

326

18 Jun 2024

A Survey of Video Datasets for Grounded Event Understanding

Kate Sanders

Benjamin Van Durme

291

14 Jun 2024

BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation PretrainingAAAI Conference on Artificial Intelligence (AAAI), 2024

317

12 Jan 2024

Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports

Mohammed Bennamoun

571

03 Jan 2024

A Simple LLM Framework for Long-Range Video Question-Answering

Mohit Bansal

Gedas Bertasius

495

172

28 Dec 2023

LvBench: A Benchmark for Long-form Video Understanding with Versatile Multi-modal Question Answering

395

08 Dec 2023

Long Story Short: a Summarize-then-Search Method for Long Video Question Answering

Jiwan Chung

Youngjae Yu

516

02 Nov 2023

Large Language Models are Temporal and Causal Reasoners for Video Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

471

24 Oct 2023

EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingNeural Information Processing Systems (NeurIPS), 2023

K. Mangalam

Raiymbek Akshulakov

Jitendra Malik

493

571

17 Aug 2023

Tem-adapter: Adapting Image-Text Pretraining for Video Question AnswerIEEE International Conference on Computer Vision (ICCV), 2023

396

16 Aug 2023

Learning to Summarize and Answer Questions about a Virtual Robot's Past ActionsAutonomous Robots (Auton. Robots), 2023

Chad DeChant

Iretiayo Akinola

Daniel Bauer

270

16 Jun 2023

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models

Yi Wang

Yu Qiao

Jiaming Song

MLLM

214

15 Jun 2023

Connecting Vision and Language with Video Localized NarrativesComputer Vision and Pattern Recognition (CVPR), 2023

411

22 Feb 2023

Modal-specific Pseudo Query Generation for Video Corpus Moment RetrievalConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

273

23 Oct 2022

Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal ModelingBritish Machine Vision Conference (BMVC), 2022

Hsin-Ying Lee

Hung-Ting Su

378

08 Oct 2022

WildQA: In-the-Wild Video Question AnsweringInternational Conference on Computational Linguistics (COLING), 2022

373

14 Sep 2022

Zero-Shot Video Question Answering via Frozen Bidirectional Language ModelsNeural Information Processing Systems (NeurIPS), 2022

566

285

16 Jun 2022

Learning to Answer Visual Questions from Web VideosIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

428

10 May 2022

Video Question Answering: Datasets, Algorithms and ChallengesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Wei Ji

360

119

02 Mar 2022

Toward a Human-Level Video Understanding Intelligence

235

08 Oct 2021

Mounting Video Metadata on Transformer-based Language Model for Open-ended Video Question Answering

274

11 Aug 2021

CogME: A Cognition-Inspired Multi-Dimensional Evaluation Metric for Story Understanding

387

21 Jul 2021

Attend What You Need: Motion-Appearance Synergistic Networks for Video Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

250

19 Jun 2021

MERLOT: Multimodal Neural Script Knowledge ModelsNeural Information Processing Systems (NeurIPS), 2021

Yejin Choi

520

439

04 Jun 2021

Recent Advances in Video Question Answering: A Review of Datasets and Methods

Devshree Patel

Ratnam Parikh

Yesha Shastri

360

15 Jan 2021

Co-attentional Transformers for Story-Based Video UnderstandingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

Björn Bebensee

Byoung-Tak Zhang

255

27 Oct 2020

Self-supervised pre-training and contrastive representation learning for multiple-choice video QAAAAI Conference on Artificial Intelligence (AAAI), 2020

410

17 Sep 2020