ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.03356
  4. Cited By
DramaQA: Character-Centered Video Story Understanding with Hierarchical
  QA
v1v2 (latest)

DramaQA: Character-Centered Video Story Understanding with Hierarchical QA

7 May 2020
Seongho Choi
Kyoung-Woon On
Y. Heo
Ahjeong Seo
Youwon Jang
Minsu Lee
Byoung-Tak Zhang
ArXiv (abs)PDFHTML

Papers citing "DramaQA: Character-Centered Video Story Understanding with Hierarchical QA"

38 / 38 papers shown
Compression then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding
Compression then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding
Da Li
Yuxiao Luo
Keping Bi
Jiafeng Guo
Wei Yuan
B. Yang
Yan Wang
Fan Yang
Tingting Gao
Guorui Zhou
VLM
337
1
0
11 Nov 2025
Prompt-Driven Agentic Video Editing System: Autonomous Comprehension of Long-Form, Story-Driven Media
Prompt-Driven Agentic Video Editing System: Autonomous Comprehension of Long-Form, Story-Driven Media
Zihan Ding
Junlong Chen
Per Ola Kristensson
Junxiao Shen
Xinyi Wang
VGen
299
2
0
20 Sep 2025
Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding
Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding
Gowreesh Mago
Pascal Mettes
Stevan Rudinac
191
0
0
28 Aug 2025
Robust Symbolic Reasoning for Visual Narratives via Hierarchical and Semantically Normalized Knowledge Graphs
Robust Symbolic Reasoning for Visual Narratives via Hierarchical and Semantically Normalized Knowledge Graphs
Yi-Chun Chen
NAI
135
1
0
20 Aug 2025
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation
Galann Pennec
Zhengyuan Liu
Nicholas Asher
Philippe Muller
Nancy F. Chen
VGen
532
1
0
10 May 2025
ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models
ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models
Dohwan Ko
S. Kim
Yumin Suh
Vijay Kumar B.G
Minseo Yoon
Manmohan Chandraker
Hyunwoo J. Kim
LRM
426
7
0
25 Mar 2025
DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering
DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question AnsweringIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Han Wang
Kai Hu
Liangcai Gao
688
4
0
20 Mar 2025
Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting
Empowering Large Language Model for Continual Video Question Answering with Collaborative PromptingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Chen Cai
Zheng Wang
J. Gao
Wenyang Liu
Ye Lu
Runzhong Zhang
Kim-Hui Yap
CLL
437
14
0
20 Jan 2025
DragonVerseQA: Open-Domain Long-Form Context-Aware Question-Answering
DragonVerseQA: Open-Domain Long-Form Context-Aware Question-Answering
A. Lahiri
Qinmin Vivian Hu
255
1
0
21 Dec 2024
QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems
QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems
Zhixian He
Pengcheng Zhao
Fuwei Zhang
Shujin Lin
345
0
0
14 Sep 2024
Question-Answering Dense Video Events
Question-Answering Dense Video EventsAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2024
Hangyu Qin
Junbin Xiao
Angela Yao
VLM
704
10
0
06 Sep 2024
Multilingual Synopses of Movie Narratives: A Dataset for Story
  Understanding
Multilingual Synopses of Movie Narratives: A Dataset for Story UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yidan Sun
Jianfei Yu
Boyang Li
326
0
0
18 Jun 2024
A Survey of Video Datasets for Grounded Event Understanding
A Survey of Video Datasets for Grounded Event Understanding
Kate Sanders
Benjamin Van Durme
291
8
0
14 Jun 2024
BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via
  Graph Representation Pretraining
BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation PretrainingAAAI Conference on Artificial Intelligence (AAAI), 2024
Minjun Kim
Seungwoo Song
Youhan Lee
Haneol Jang
Kyungtae Lim
317
10
0
12 Jan 2024
Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports
Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports
Haopeng Li
Andong Deng
Qiuhong Ke
Jun Liu
Hossein Rahmani
Yulan Guo
Mohammed Bennamoun
Chen Chen
571
35
0
03 Jan 2024
A Simple LLM Framework for Long-Range Video Question-Answering
A Simple LLM Framework for Long-Range Video Question-Answering
Ce Zhang
Taixi Lu
Md. Mohaiminul Islam
Ziyang Wang
Shoubin Yu
Mohit Bansal
Gedas Bertasius
495
172
0
28 Dec 2023
LvBench: A Benchmark for Long-form Video Understanding with Versatile Multi-modal Question Answering
LvBench: A Benchmark for Long-form Video Understanding with Versatile Multi-modal Question Answering
Hongjie Zhang
Lu Dong
Yi Liu
Yifei Huang
Z. Ling
Yali Wang
Limin Wang
395
32
0
08 Dec 2023
Long Story Short: a Summarize-then-Search Method for Long Video Question
  Answering
Long Story Short: a Summarize-then-Search Method for Long Video Question Answering
Jiwan Chung
Youngjae Yu
516
7
0
02 Nov 2023
Large Language Models are Temporal and Causal Reasoners for Video
  Question Answering
Large Language Models are Temporal and Causal Reasoners for Video Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Dohwan Ko
Ji Soo Lee
Wooyoung Kang
Byungseok Roh
Hyunwoo J. Kim
LRM
471
61
0
24 Oct 2023
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language
  Understanding
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingNeural Information Processing Systems (NeurIPS), 2023
K. Mangalam
Raiymbek Akshulakov
Jitendra Malik
493
571
0
17 Aug 2023
Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer
Tem-adapter: Adapting Image-Text Pretraining for Video Question AnswerIEEE International Conference on Computer Vision (ICCV), 2023
Guangyi Chen
Xiao Liu
Guangrun Wang
Kun Zhang
Philip H.S.Torr
Xiaoping Zhang
Yansong Tang
396
32
0
16 Aug 2023
Learning to Summarize and Answer Questions about a Virtual Robot's Past
  Actions
Learning to Summarize and Answer Questions about a Virtual Robot's Past ActionsAutonomous Robots (Auton. Robots), 2023
Chad DeChant
Iretiayo Akinola
Daniel Bauer
270
14
0
16 Jun 2023
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen
  Large Language Models
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
Junting Pan
Ziyi Lin
Yuying Ge
Xiatian Zhu
Renrui Zhang
Yi Wang
Yu Qiao
Jiaming Song
MLLM
214
38
0
15 Jun 2023
Connecting Vision and Language with Video Localized Narratives
Connecting Vision and Language with Video Localized NarrativesComputer Vision and Pattern Recognition (CVPR), 2023
P. Voigtlaender
Soravit Changpinyo
Jordi Pont-Tuset
Radu Soricut
V. Ferrari
VGen
411
32
0
22 Feb 2023
Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval
Modal-specific Pseudo Query Generation for Video Corpus Moment RetrievalConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Minjoon Jung
Seongho Choi
Joo-Kyung Kim
Jin-Hwa Kim
Byoung-Tak Zhang
273
11
0
23 Oct 2022
Learning Fine-Grained Visual Understanding for Video Question Answering
  via Decoupling Spatial-Temporal Modeling
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal ModelingBritish Machine Vision Conference (BMVC), 2022
Hsin-Ying Lee
Hung-Ting Su
Bing-Chen Tsai
Tsung-Han Wu
Jia-Fong Yeh
Winston H. Hsu
378
2
0
08 Oct 2022
WildQA: In-the-Wild Video Question Answering
WildQA: In-the-Wild Video Question AnsweringInternational Conference on Computational Linguistics (COLING), 2022
Santiago Castro
Naihao Deng
Pingxuan Huang
Mihai Burzo
Amélie Reymond
373
9
0
14 Sep 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language
  Models
Zero-Shot Video Question Answering via Frozen Bidirectional Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
566
285
0
16 Jun 2022
Learning to Answer Visual Questions from Web Videos
Learning to Answer Visual Questions from Web VideosIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
428
41
0
10 May 2022
Video Question Answering: Datasets, Algorithms and Challenges
Video Question Answering: Datasets, Algorithms and ChallengesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
360
119
0
02 Mar 2022
Toward a Human-Level Video Understanding Intelligence
Toward a Human-Level Video Understanding Intelligence
Y. Heo
Minsu Lee
Seongho Choi
Woo Suk Choi
Minjung Shin
Minjoon Jung
Jeh-Kwang Ryu
Byoung-Tak Zhang
235
0
0
08 Oct 2021
Mounting Video Metadata on Transformer-based Language Model for
  Open-ended Video Question Answering
Mounting Video Metadata on Transformer-based Language Model for Open-ended Video Question Answering
Donggeon Lee
Seongho Choi
Youwon Jang
Byoung-Tak Zhang
274
2
0
11 Aug 2021
CogME: A Cognition-Inspired Multi-Dimensional Evaluation Metric for
  Story Understanding
CogME: A Cognition-Inspired Multi-Dimensional Evaluation Metric for Story Understanding
Minjung Shin
Seongho Choi
Y. Heo
M. Lee
Byoung-Tak Zhang
Jeh-Kwang Ryu
387
2
0
21 Jul 2021
Attend What You Need: Motion-Appearance Synergistic Networks for Video
  Question Answering
Attend What You Need: Motion-Appearance Synergistic Networks for Video Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Ahjeong Seo
Gi-Cheon Kang
J. Park
Byoung-Tak Zhang
250
57
0
19 Jun 2021
MERLOT: Multimodal Neural Script Knowledge Models
MERLOT: Multimodal Neural Script Knowledge ModelsNeural Information Processing Systems (NeurIPS), 2021
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLMLRM
520
439
0
04 Jun 2021
Recent Advances in Video Question Answering: A Review of Datasets and
  Methods
Recent Advances in Video Question Answering: A Review of Datasets and Methods
Devshree Patel
Ratnam Parikh
Yesha Shastri
360
21
0
15 Jan 2021
Co-attentional Transformers for Story-Based Video Understanding
Co-attentional Transformers for Story-Based Video UnderstandingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Björn Bebensee
Byoung-Tak Zhang
255
7
0
27 Oct 2020
Self-supervised pre-training and contrastive representation learning for
  multiple-choice video QA
Self-supervised pre-training and contrastive representation learning for multiple-choice video QAAAAI Conference on Artificial Intelligence (AAAI), 2020
Seonhoon Kim
Seohyeong Jeong
Eunbyul Kim
Inho Kang
Nojun Kwak
SSL
410
44
0
17 Sep 2020
1
Page 1 of 1