ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1809.01696
  4. Cited By
TVQA: Localized, Compositional Video Question Answering

TVQA: Localized, Compositional Video Question Answering

5 September 2018
Muhammad Abdul Wahab
Licheng Yu
Mounir Nasr Allah
Tamara L. Berg
ArXivPDFHTML

Papers citing "TVQA: Localized, Compositional Video Question Answering"

50 / 126 papers shown
Title
Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval
Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval
Minjoon Jung
Seongho Choi
Joo-Kyung Kim
Jin-Hwa Kim
Byoung-Tak Zhang
34
7
0
23 Oct 2022
Dense but Efficient VideoQA for Intricate Compositional Reasoning
Dense but Efficient VideoQA for Intricate Compositional Reasoning
Jihyeon Janel Lee
Wooyoung Kang
Eun-Sol Kim
CoGe
16
3
0
19 Oct 2022
Selective Query-guided Debiasing for Video Corpus Moment Retrieval
Selective Query-guided Debiasing for Video Corpus Moment Retrieval
Sunjae Yoon
Jiajing Hong
Eunseop Yoon
Dahyun Kim
Junyeong Kim
Hee Suk Yoon
Changdong Yoo
33
21
0
17 Oct 2022
NormSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations
  On-the-Fly
NormSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly
Yi Ren Fung
Tuhin Chakraborty
Hao Guo
Owen Rambow
Smaranda Muresan
Heng Ji
21
39
0
16 Oct 2022
Locate before Answering: Answer Guided Question Localization for Video
  Question Answering
Locate before Answering: Answer Guided Question Localization for Video Question Answering
Tianwen Qian
Ran Cui
Jingjing Chen
Pai Peng
Xiao-Wei Guo
Yu-Gang Jiang
29
17
0
05 Oct 2022
WildQA: In-the-Wild Video Question Answering
WildQA: In-the-Wild Video Question Answering
Santiago Castro
Naihao Deng
Pingxuan Huang
Mihai Burzo
Rada Mihalcea
70
7
0
14 Sep 2022
Frame-Subtitle Self-Supervision for Multi-Modal Video Question Answering
Frame-Subtitle Self-Supervision for Multi-Modal Video Question Answering
Jiong Wang
Zhou Zhao
Weike Jin
18
0
0
08 Sep 2022
Interactive Question Answering Systems: Literature Review
Interactive Question Answering Systems: Literature Review
Giovanni Maria Biancofiore
Yashar Deldjoo
T. D. Noia
E. Sciascio
F. Narducci
34
13
0
04 Sep 2022
An Empirical Study of End-to-End Video-Language Transformers with Masked
  Visual Modeling
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
VLM
21
63
0
04 Sep 2022
Dilated Context Integrated Network with Cross-Modal Consensus for
  Temporal Emotion Localization in Videos
Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos
Juncheng Billy Li
Junlin Xie
Linchao Zhu
Long Qian
Siliang Tang
...
Haochen Shi
Shengyu Zhang
Longhui Wei
Qi Tian
Yueting Zhuang
32
12
0
03 Aug 2022
Video Question Answering with Iterative Video-Text Co-Tokenization
Video Question Answering with Iterative Video-Text Co-Tokenization
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
20
18
0
01 Aug 2022
Meta Spatio-Temporal Debiasing for Video Scene Graph Generation
Meta Spatio-Temporal Debiasing for Video Scene Graph Generation
Li Xu
Haoxuan Qu
Jason Kuen
Jiuxiang Gu
Jun Liu
CML
27
27
0
23 Jul 2022
CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination
CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination
Hyounghun Kim
Abhaysinh Zala
Mohit Bansal
22
6
0
08 Jul 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language
  Models
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
36
227
0
16 Jun 2022
LAVENDER: Unifying Video-Language Understanding as Masked Language
  Modeling
LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Chung-Ching Lin
Zicheng Liu
Ce Liu
Lijuan Wang
MLLM
VLM
20
81
0
14 Jun 2022
Revisiting the "Video" in Video-Language Understanding
Revisiting the "Video" in Video-Language Understanding
S. Buch
Cristobal Eyzaguirre
Adrien Gaidon
Jiajun Wu
L. Fei-Fei
Juan Carlos Niebles
27
156
0
03 Jun 2022
Learning to Retrieve Videos by Asking Questions
Learning to Retrieve Videos by Asking Questions
Avinash Madasu
Junier Oliva
Gedas Bertasius
VGen
30
16
0
11 May 2022
Learning to Answer Visual Questions from Web Videos
Learning to Answer Visual Questions from Web Videos
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
34
33
0
10 May 2022
Episodic Memory Question Answering
Episodic Memory Question Answering
Samyak Datta
Sameer Dharur
Vincent Cartillier
Ruta Desai
Mukul Khanna
Dhruv Batra
Devi Parikh
EgoV
11
31
0
03 May 2022
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Guangyao Li
Yake Wei
Yapeng Tian
Chenliang Xu
Ji-Rong Wen
Di Hu
29
136
0
26 Mar 2022
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One
  More Step Towards Generalization
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
Alexander Kunitsyn
M. Kalashnikov
Maksim Dzabraev
Andrei Ivaniuta
28
16
0
14 Mar 2022
AssistQ: Affordance-centric Question-driven Task Completion for
  Egocentric Assistant
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
B. Wong
Joya Chen
You Wu
Stan Weixian Lei
Dongxing Mao
Difei Gao
Mike Zheng Shou
EgoV
32
27
0
08 Mar 2022
Video Question Answering: Datasets, Algorithms and Challenges
Video Question Answering: Datasets, Algorithms and Challenges
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
24
85
0
02 Mar 2022
VLP: A Survey on Vision-Language Pre-training
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
82
212
0
18 Feb 2022
NEWSKVQA: Knowledge-Aware News Video Question Answering
NEWSKVQA: Knowledge-Aware News Video Question Answering
Pranay Gupta
Manish Gupta
22
7
0
08 Feb 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and
  Sound
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
26
207
0
07 Jan 2022
MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media
  Knowledge Extraction and Grounding
MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding
Revanth Reddy Gangi Reddy
Xilin Rui
Manling Li
Xudong Lin
Haoyang Wen
...
Mohit Bansal
Avirup Sil
Shih-Fu Chang
A. Schwing
Heng Ji
17
31
0
20 Dec 2021
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Dongxu Li
Junnan Li
Hongdong Li
Juan Carlos Niebles
S. Hoi
22
191
0
17 Dec 2021
3D Question Answering
3D Question Answering
Shuquan Ye
Dongdong Chen
Songfang Han
Jing Liao
ViT
26
46
0
15 Dec 2021
VL-Adapter: Parameter-Efficient Transfer Learning for
  Vision-and-Language Tasks
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
Yi-Lin Sung
Jaemin Cho
Mohit Bansal
VLM
VPVLM
29
342
0
13 Dec 2021
Question Answering Survey: Directions, Challenges, Datasets, Evaluation
  Matrices
Question Answering Survey: Directions, Challenges, Datasets, Evaluation Matrices
Hariom A. Pandya
Brijesh S. Bhatt
40
27
0
07 Dec 2021
Video-Text Pre-training with Learned Regions
Video-Text Pre-training with Learned Regions
Rui Yan
Mike Zheng Shou
Yixiao Ge
Alex Jinpeng Wang
Xudong Lin
Guanyu Cai
Jinhui Tang
30
23
0
02 Dec 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token
  Modeling
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
W. Wang
Lijuan Wang
Zicheng Liu
VLM
39
216
0
24 Nov 2021
Dynamic Visual Reasoning by Learning Differentiable Physics Models from
  Video and Language
Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
Mingyu Ding
Zhenfang Chen
Tao Du
Ping Luo
J. Tenenbaum
Chuang Gan
VGen
PINN
OCL
30
74
0
28 Oct 2021
Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$
  Videos
Pano-AVQA: Grounded Audio-Visual Question Answering on 360∘^\circ∘ Videos
Heeseung Yun
Youngjae Yu
Wonsuk Yang
Kangil Lee
Gunhee Kim
25
78
0
11 Oct 2021
Distantly-Supervised Evidence Retrieval Enables Question Answering
  without Evidence Annotation
Distantly-Supervised Evidence Retrieval Enables Question Answering without Evidence Annotation
Chen Zhao
Chenyan Xiong
Jordan L. Boyd-Graber
Hal Daumé
RALM
21
8
0
10 Oct 2021
Natural Language Video Localization with Learnable Moment Proposals
Natural Language Video Localization with Learnable Moment Proposals
Shaoning Xiao
Long Chen
Jian Shao
Yueting Zhuang
Jun Xiao
11
43
0
22 Sep 2021
Survey: Transformer based Video-Language Pre-training
Survey: Transformer based Video-Language Pre-training
Ludan Ruan
Qin Jin
VLM
ViT
72
44
0
21 Sep 2021
A Survey on Temporal Sentence Grounding in Videos
A Survey on Temporal Sentence Grounding in Videos
Xiaohan Lan
Yitian Yuan
Xin Eric Wang
Zhi Wang
Wenwu Zhu
30
47
0
16 Sep 2021
M5Product: Self-harmonized Contrastive Learning for E-commercial
  Multi-modal Pretraining
M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining
Xiao Dong
Xunlin Zhan
Yangxin Wu
Yunchao Wei
Michael C. Kampffmeyer
Xiaoyong Wei
Minlong Lu
Yaowei Wang
Xiaodan Liang
25
36
0
09 Sep 2021
Support-Set Based Cross-Supervision for Video Grounding
Support-Set Based Cross-Supervision for Video Grounding
Xinpeng Ding
N. Wang
Shiwei Zhang
De-Chun Cheng
Xiaomeng Li
Ziyuan Huang
Mingqian Tang
Xinbo Gao
33
42
0
24 Aug 2021
Mounting Video Metadata on Transformer-based Language Model for
  Open-ended Video Question Answering
Mounting Video Metadata on Transformer-based Language Model for Open-ended Video Question Answering
Donggeon Lee
Seongho Choi
Youwon Jang
Byoung-Tak Zhang
16
2
0
11 Aug 2021
End-to-end Multi-modal Video Temporal Grounding
End-to-end Multi-modal Video Temporal Grounding
Yi-Wen Chen
Yi-Hsuan Tsai
Ming-Hsuan Yang
11
51
0
12 Jul 2021
Interventional Video Grounding with Dual Contrastive Learning
Interventional Video Grounding with Dual Contrastive Learning
Guoshun Nan
Rui Qiao
Yao Xiao
Jun Liu
Sicong Leng
H. Zhang
Wei Lu
23
144
0
21 Jun 2021
Attend What You Need: Motion-Appearance Synergistic Networks for Video
  Question Answering
Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering
Ahjeong Seo
Gi-Cheon Kang
J. Park
Byoung-Tak Zhang
13
53
0
19 Jun 2021
GEM: A General Evaluation Benchmark for Multimodal Tasks
GEM: A General Evaluation Benchmark for Multimodal Tasks
Lin Su
Nan Duan
Edward Cui
Lei Ji
Chenfei Wu
Huaishao Luo
Yongfei Liu
Ming Zhong
Taroon Bharti
Arun Sacheti
VLM
19
19
0
18 Jun 2021
Bridge to Answer: Structure-aware Graph Interaction Network for Video
  Question Answering
Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering
Jungin Park
Jiyoung Lee
K. Sohn
159
100
0
29 Apr 2021
Grounding Physical Concepts of Objects and Events Through Dynamic Visual
  Reasoning
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
Zhenfang Chen
Jiayuan Mao
Jiajun Wu
Kwan-Yee Kenneth Wong
J. Tenenbaum
Chuang Gan
VGen
36
92
0
30 Mar 2021
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action
  Localization
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization
Mengmeng Xu
Juan-Manuel Perez-Rua
Xiatian Zhu
Bernard Ghanem
Brais Martinez
15
27
0
28 Mar 2021
Structured Co-reference Graph Attention for Video-grounded Dialogue
Structured Co-reference Graph Attention for Video-grounded Dialogue
Junyeong Kim
Sunjae Yoon
Dahyun Kim
Chang-Dong Yoo
18
26
0
24 Mar 2021
Previous
123
Next