Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1803.10906
Cited By
Motion-Appearance Co-Memory Networks for Video Question Answering
29 March 2018
J. Gao
Runzhou Ge
Kan Chen
Ram Nevatia
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Motion-Appearance Co-Memory Networks for Video Question Answering"
50 / 118 papers shown
Title
Equivariant and Invariant Grounding for Video Question Answering
Yicong Li
Xiang Wang
Junbin Xiao
Tat-Seng Chua
14
25
0
26 Jul 2022
Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering
Yang Liu
Guanbin Li
Liang Lin
LRM
28
80
0
26 Jul 2022
Video Graph Transformer for Video Question Answering
Junbin Xiao
Pan Zhou
Tat-Seng Chua
Shuicheng Yan
ViT
142
75
0
12 Jul 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
34
226
0
16 Jun 2022
Invariant Grounding for Video Question Answering
Yicong Li
Xiang Wang
Junbin Xiao
Wei Ji
Tat-Seng Chua
OOD
15
95
0
06 Jun 2022
Structured Two-stream Attention Network for Video Question Answering
Lianli Gao
Pengpeng Zeng
Jingkuan Song
Yuan-Fang Li
Wu Liu
Tao Mei
Heng Tao Shen
25
68
0
02 Jun 2022
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering
Jiangtong Li
Li Niu
Liqing Zhang
12
49
0
30 May 2022
Learnable Optimal Sequential Grouping for Video Scene Detection
Daniel Rotman
Yevgeny Yaroker
Elad Amrani
Udi Barzelay
Rami Ben-Ari
14
10
0
17 May 2022
Modeling Semantic Composition with Syntactic Hypergraph for Video Question Answering
Zenan Xu
Wanjun Zhong
Qinliang Su
Zijing Ou
Fuwei Zhang
12
3
0
13 May 2022
Learning to Answer Visual Questions from Web Videos
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
32
33
0
10 May 2022
Multilevel Hierarchical Network with Multiscale Sampling for Video Question Answering
Min Peng
Chongyang Wang
Yuan Gao
Yu Shi
Xiang-Dong Zhou
16
24
0
09 May 2022
Rethinking Multi-Modal Alignment in Video Question Answering from Feature and Sample Perspectives
Shaoning Xiao
Long Chen
Kaifeng Gao
Zhao Wang
Yi Yang
Zhimeng Zhang
Jun Xiao
8
5
0
25 Apr 2022
Revitalize Region Feature for Democratizing Video-Language Pre-training of Retrieval
Guanyu Cai
Yixiao Ge
Binjie Zhang
Alex Jinpeng Wang
Rui Yan
...
Ying Shan
Lianghua He
Xiaohu Qie
Jianping Wu
Mike Zheng Shou
VLM
10
6
0
15 Mar 2022
Video Question Answering: Datasets, Algorithms and Challenges
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
16
85
0
02 Mar 2022
(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering
A. Cherian
Chiori Hori
Tim K. Marks
Jonathan Le Roux
8
35
0
18 Feb 2022
Temporal Sentence Grounding in Videos: A Survey and Future Directions
Hao Zhang
Aixin Sun
Wei Jing
Joey Tianyi Zhou
3DGS
36
38
0
20 Jan 2022
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Dongxu Li
Junnan Li
Hongdong Li
Juan Carlos Niebles
S. Hoi
20
191
0
17 Dec 2021
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising
Jianjie Luo
Yehao Li
Yingwei Pan
Ting Yao
Hongyang Chao
Tao Mei
VLM
18
41
0
14 Dec 2021
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering
Junbin Xiao
Angela Yao
Zhiyuan Liu
Yicong Li
Wei Ji
Tat-Seng Chua
30
111
0
12 Dec 2021
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering
Jingjing Jiang
Zi-yi Liu
N. Zheng
17
13
0
29 Nov 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
W. Wang
Lijuan Wang
Zicheng Liu
VLM
34
216
0
24 Nov 2021
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Hongwei Xue
Tiankai Hang
Yanhong Zeng
Yuchong Sun
Bei Liu
Huan Yang
Jianlong Fu
B. Guo
AI4TS
VLM
27
189
0
19 Nov 2021
AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation
Khoa T. Vo
Kevin Hyekang Joo
Kashu Yamazaki
Sang Truong
Kris M. Kitani
Minh-Triet Tran
Ngan Le
EgoV
48
17
0
21 Oct 2021
Temporal Pyramid Transformer with Multimodal Interaction for Video Question Answering
Min Peng
Chongyang Wang
Yuan Gao
Yu Shi
Xiangdong Zhou
42
3
0
10 Sep 2021
Spatio-Temporal Perturbations for Video Attribution
Zhenqiang Li
Weimin Wang
Zuoyue Li
Yifei Huang
Yoichi Sato
6
6
0
01 Sep 2021
DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering
Jianyu Wang
Bingkun Bao
Changsheng Xu
15
75
0
10 Jul 2021
Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering
Long Hoang Dang
T. Le
Vuong Le
T. Tran
25
60
0
25 Jun 2021
Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering
Ahjeong Seo
Gi-Cheon Kang
J. Park
Byoung-Tak Zhang
13
53
0
19 Jun 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
31
440
0
18 May 2021
Relation-aware Hierarchical Attention Framework for Video Question Answering
Fangtao Li
Ting Bai
Chenyu Cao
Zihe Liu
C. Yan
Bin Wu
37
14
0
13 May 2021
Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering
Jungin Park
Jiyoung Lee
K. Sohn
157
100
0
29 Apr 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
Hung Le
Nancy F. Chen
S. Hoi
MLLM
18
19
0
16 Apr 2021
Object-Centric Representation Learning for Video Question Answering
Long Hoang Dang
T. Le
Vuong Le
T. Tran
21
7
0
12 Apr 2021
Natural Language Video Localization: A Revisit in Span-based Question Answering Framework
Hao Zhang
Aixin Sun
Wei Jing
Liangli Zhen
Joey Tianyi Zhou
Rick Siow Mong Goh
111
84
0
26 Feb 2021
Temporal Memory Attention for Video Semantic Segmentation
Hao Wang
Weining Wang
Jing Liu
VOS
29
66
0
17 Feb 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Mohit Bansal
Jingjing Liu
CLIP
32
646
0
11 Feb 2021
Recent Advances in Video Question Answering: A Review of Datasets and Methods
Devshree Patel
Ratnam Parikh
Yesha Shastri
11
18
0
15 Jan 2021
End-to-End Video Question-Answer Generation with Generator-Pretester Network
Hung-Ting Su
Chen-Hsi Chang
Po-Wei Shen
Yu-Siang Wang
Ya-Liang Chang
Yu-Cheng Chang
Pu-Jen Cheng
Winston H. Hsu
27
31
0
05 Jan 2021
Trying Bilinear Pooling in Video-QA
T. Winterbottom
S. Xiao
A. McLean
Noura Al Moubayed
17
3
0
18 Dec 2020
Look Before you Speak: Visually Contextualized Utterances
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
19
66
0
10 Dec 2020
Learning to Respond with Your Favorite Stickers: A Framework of Unifying Multi-Modality and User Preference in Multi-Turn Dialog
Shen Gao
Xiuying Chen
Li Liu
Dongyan Zhao
Rui Yan
14
14
0
05 Nov 2020
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues
Hung Le
Doyen Sahoo
Nancy F. Chen
S. Hoi
38
30
0
20 Oct 2020
Hierarchical Conditional Relation Networks for Multimodal Video Question Answering
T. Le
Vuong Le
Svetha Venkatesh
T. Tran
BDL
10
22
0
18 Oct 2020
Data augmentation techniques for the Video Question Answering task
Alex Falcon
O. Lanz
G. Serra
EgoV
8
3
0
22 Aug 2020
Location-aware Graph Convolutional Networks for Video Question Answering
Deng Huang
Peihao Chen
Runhao Zeng
Qing Du
Mingkui Tan
Chuang Gan
GNN
BDL
9
172
0
07 Aug 2020
Modality Shifting Attention Network for Multi-modal Video Question Answering
Junyeong Kim
Minuk Ma
T. Pham
Kyungsu Kim
Chang-Dong Yoo
10
72
0
04 Jul 2020
Structured Multimodal Attentions for TextVQA
Chenyu Gao
Qi Zhu
Peng Wang
Hui Li
Yuliang Liu
A. Hengel
Qi Wu
10
60
0
01 Jun 2020
Character Matters: Video Story Understanding with Character-Aware Relations
Shijie Geng
Ji Zhang
Zuohui Fu
Peng Gao
Hang Zhang
Gerard de Melo
18
11
0
09 May 2020
Span-based Localizing Network for Natural Language Video Localization
Hao Zhang
Aixin Sun
Wei Jing
Joey Tianyi Zhou
15
311
0
29 Apr 2020
Video Object Grounding using Semantic Roles in Language Description
Arka Sadhu
Kan Chen
Ram Nevatia
13
48
0
24 Mar 2020
Previous
1
2
3
Next