Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.11574
Cited By
TVQA+: Spatio-Temporal Grounding for Video Question Answering
25 April 2019
Jie Lei
Licheng Yu
Tamara L. Berg
Mohit Bansal
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TVQA+: Spatio-Temporal Grounding for Video Question Answering"
29 / 29 papers shown
Title
Enhancing the Learning Experience: Using Vision-Language Models to Generate Questions for Educational Videos
Markos Stamatakis
Joshua Berger
Christian Wartena
Ralph Ewerth
Anett Hoppe
AI4Ed
37
0
0
03 May 2025
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Thanh-Son Nguyen
Hong Yang
Tzeh Yuan Neoh
Hao Zhang
Ee Yeo Keat
Basura Fernando
NAI
54
0
0
19 Mar 2025
TimeRefine: Temporal Grounding with Time Refining Video LLM
Xizi Wang
Feng Cheng
Ziyang Wang
Huiyu Wang
Md. Mohaiminul Islam
Lorenzo Torresani
Mohit Bansal
Gedas Bertasius
David J. Crandall
109
1
0
12 Dec 2024
Scene-Text Grounding for Text-Based Video Question Answering
Sheng Zhou
Junbin Xiao
Xun Yang
Peipei Song
Dan Guo
Angela Yao
Meng Wang
Tat-Seng Chua
69
1
0
22 Sep 2024
QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems
Zhixian He
Pengcheng Zhao
Fuwei Zhang
Shujin Lin
31
0
0
14 Sep 2024
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
Jongwoo Park
Kanchana Ranasinghe
Kumara Kahatapitiya
Wonjeong Ryoo
Donghyun Kim
Michael S. Ryoo
60
20
0
13 Jun 2024
Encoding and Controlling Global Semantics for Long-form Video Question Answering
Thong Nguyen
Zhiyuan Hu
Xiaobao Wu
Cong-Duy Nguyen
See-Kiong Ng
A. Luu
35
2
0
30 May 2024
STAR: A Benchmark for Situated Reasoning in Real-World Videos
Bo Wu
Shoubin Yu
Zhenfang Chen
Joshua B Tenenbaum
Chuang Gan
33
176
0
15 May 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&Ro
LLMAG
48
35
0
16 Jan 2024
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model
Guozhang Li
Xinpeng Ding
De-Chun Cheng
Jie Li
Nannan Wang
Xinbo Gao
25
1
0
05 Dec 2023
Frame-Subtitle Self-Supervision for Multi-Modal Video Question Answering
Jiong Wang
Zhou Zhao
Weike Jin
11
0
0
08 Sep 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
19
225
0
16 Jun 2022
Learning to Answer Visual Questions from Web Videos
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
28
33
0
10 May 2022
Visual Spatial Reasoning
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
21
155
0
30 Apr 2022
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
Hazel Doughty
Cees G. M. Snoek
20
19
0
23 Mar 2022
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
B. Wong
Joya Chen
You Wu
Stan Weixian Lei
Dongxing Mao
Difei Gao
Mike Zheng Shou
EgoV
27
27
0
08 Mar 2022
NEWSKVQA: Knowledge-Aware News Video Question Answering
Pranay Gupta
Manish Gupta
14
7
0
08 Feb 2022
Video-Text Pre-training with Learned Regions
Rui Yan
Mike Zheng Shou
Yixiao Ge
Alex Jinpeng Wang
Xudong Lin
Guanyu Cai
Jinhui Tang
25
23
0
02 Dec 2021
Weakly-Supervised Video Object Grounding via Causal Intervention
Wei Wang
Junyu Gao
Changsheng Xu
CML
23
20
0
01 Dec 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
W. Wang
Lijuan Wang
Zicheng Liu
VLM
34
215
0
24 Nov 2021
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries
Jie Lei
Tamara L. Berg
Mohit Bansal
ViT
14
62
0
20 Jul 2021
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Po-Yao (Bernie) Huang
Mandela Patrick
Junjie Hu
Graham Neubig
Florian Metze
Alexander G. Hauptmann
MLLM
VLM
16
56
0
16 Mar 2021
TMT: A Transformer-based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-aware Dialog
Wubo Li
Dongwei Jiang
Wei Zou
Xiangang Li
8
6
0
21 Oct 2020
Condensed Movies: Story Based Retrieval with Contextual Embeddings
Max Bain
Arsha Nagrani
A. Brown
Andrew Zisserman
6
100
0
08 May 2020
A Multimodal Target-Source Classifier with Attention Branches to Understand Ambiguous Instructions for Fetching Daily Objects
A. Magassouba
K. Sugiura
Hisashi Kawai
28
9
0
23 Dec 2019
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
A. Schwing
LRM
ReLM
26
9
0
31 Oct 2019
LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval
Reuben Tan
Huijuan Xu
Kate Saenko
Bryan A. Plummer
4
67
0
27 Sep 2019
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation
Tianwei Lin
Xu Zhao
Haisheng Su
Chongjing Wang
Ming Yang
135
691
0
08 Jun 2018
Neural Baby Talk
Jiasen Lu
Jianwei Yang
Dhruv Batra
Devi Parikh
VLM
189
432
0
27 Mar 2018
1