ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.10906
  4. Cited By
Motion-Appearance Co-Memory Networks for Video Question Answering

Motion-Appearance Co-Memory Networks for Video Question Answering

29 March 2018
J. Gao
Runzhou Ge
Kan Chen
Ram Nevatia
ArXivPDFHTML

Papers citing "Motion-Appearance Co-Memory Networks for Video Question Answering"

50 / 118 papers shown
Title
Equivariant and Invariant Grounding for Video Question Answering
Equivariant and Invariant Grounding for Video Question Answering
Yicong Li
Xiang Wang
Junbin Xiao
Tat-Seng Chua
14
25
0
26 Jul 2022
Cross-Modal Causal Relational Reasoning for Event-Level Visual Question
  Answering
Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering
Yang Liu
Guanbin Li
Liang Lin
LRM
28
80
0
26 Jul 2022
Video Graph Transformer for Video Question Answering
Video Graph Transformer for Video Question Answering
Junbin Xiao
Pan Zhou
Tat-Seng Chua
Shuicheng Yan
ViT
142
75
0
12 Jul 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language
  Models
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
34
226
0
16 Jun 2022
Invariant Grounding for Video Question Answering
Invariant Grounding for Video Question Answering
Yicong Li
Xiang Wang
Junbin Xiao
Wei Ji
Tat-Seng Chua
OOD
15
95
0
06 Jun 2022
Structured Two-stream Attention Network for Video Question Answering
Structured Two-stream Attention Network for Video Question Answering
Lianli Gao
Pengpeng Zeng
Jingkuan Song
Yuan-Fang Li
Wu Liu
Tao Mei
Heng Tao Shen
25
68
0
02 Jun 2022
From Representation to Reasoning: Towards both Evidence and Commonsense
  Reasoning for Video Question-Answering
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering
Jiangtong Li
Li Niu
Liqing Zhang
12
49
0
30 May 2022
Learnable Optimal Sequential Grouping for Video Scene Detection
Learnable Optimal Sequential Grouping for Video Scene Detection
Daniel Rotman
Yevgeny Yaroker
Elad Amrani
Udi Barzelay
Rami Ben-Ari
14
10
0
17 May 2022
Modeling Semantic Composition with Syntactic Hypergraph for Video
  Question Answering
Modeling Semantic Composition with Syntactic Hypergraph for Video Question Answering
Zenan Xu
Wanjun Zhong
Qinliang Su
Zijing Ou
Fuwei Zhang
12
3
0
13 May 2022
Learning to Answer Visual Questions from Web Videos
Learning to Answer Visual Questions from Web Videos
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
32
33
0
10 May 2022
Multilevel Hierarchical Network with Multiscale Sampling for Video
  Question Answering
Multilevel Hierarchical Network with Multiscale Sampling for Video Question Answering
Min Peng
Chongyang Wang
Yuan Gao
Yu Shi
Xiang-Dong Zhou
16
24
0
09 May 2022
Rethinking Multi-Modal Alignment in Video Question Answering from
  Feature and Sample Perspectives
Rethinking Multi-Modal Alignment in Video Question Answering from Feature and Sample Perspectives
Shaoning Xiao
Long Chen
Kaifeng Gao
Zhao Wang
Yi Yang
Zhimeng Zhang
Jun Xiao
8
5
0
25 Apr 2022
Revitalize Region Feature for Democratizing Video-Language Pre-training
  of Retrieval
Revitalize Region Feature for Democratizing Video-Language Pre-training of Retrieval
Guanyu Cai
Yixiao Ge
Binjie Zhang
Alex Jinpeng Wang
Rui Yan
...
Ying Shan
Lianghua He
Xiaohu Qie
Jianping Wu
Mike Zheng Shou
VLM
10
6
0
15 Mar 2022
Video Question Answering: Datasets, Algorithms and Challenges
Video Question Answering: Datasets, Algorithms and Challenges
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
16
85
0
02 Mar 2022
(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering
(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering
A. Cherian
Chiori Hori
Tim K. Marks
Jonathan Le Roux
8
35
0
18 Feb 2022
Temporal Sentence Grounding in Videos: A Survey and Future Directions
Temporal Sentence Grounding in Videos: A Survey and Future Directions
Hao Zhang
Aixin Sun
Wei Jing
Joey Tianyi Zhou
3DGS
36
38
0
20 Jan 2022
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Dongxu Li
Junnan Li
Hongdong Li
Juan Carlos Niebles
S. Hoi
20
191
0
17 Dec 2021
CoCo-BERT: Improving Video-Language Pre-training with Contrastive
  Cross-modal Matching and Denoising
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising
Jianjie Luo
Yehao Li
Yingwei Pan
Ting Yao
Hongyang Chao
Tao Mei
VLM
18
41
0
14 Dec 2021
Video as Conditional Graph Hierarchy for Multi-Granular Question
  Answering
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering
Junbin Xiao
Angela Yao
Zhiyuan Liu
Yicong Li
Wei Ji
Tat-Seng Chua
30
111
0
12 Dec 2021
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video
  Question Answering
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering
Jingjing Jiang
Zi-yi Liu
N. Zheng
17
13
0
29 Nov 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token
  Modeling
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
W. Wang
Lijuan Wang
Zicheng Liu
VLM
34
216
0
24 Nov 2021
Advancing High-Resolution Video-Language Representation with Large-Scale
  Video Transcriptions
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Hongwei Xue
Tiankai Hang
Yanhong Zeng
Yuchong Sun
Bei Liu
Huan Yang
Jianlong Fu
B. Guo
AI4TS
VLM
27
189
0
19 Nov 2021
AEI: Actors-Environment Interaction with Adaptive Attention for Temporal
  Action Proposals Generation
AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation
Khoa T. Vo
Kevin Hyekang Joo
Kashu Yamazaki
Sang Truong
Kris M. Kitani
Minh-Triet Tran
Ngan Le
EgoV
48
17
0
21 Oct 2021
Temporal Pyramid Transformer with Multimodal Interaction for Video
  Question Answering
Temporal Pyramid Transformer with Multimodal Interaction for Video Question Answering
Min Peng
Chongyang Wang
Yuan Gao
Yu Shi
Xiangdong Zhou
42
3
0
10 Sep 2021
Spatio-Temporal Perturbations for Video Attribution
Spatio-Temporal Perturbations for Video Attribution
Zhenqiang Li
Weimin Wang
Zuoyue Li
Yifei Huang
Yoichi Sato
6
6
0
01 Sep 2021
DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering
DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering
Jianyu Wang
Bingkun Bao
Changsheng Xu
15
75
0
10 Jul 2021
Hierarchical Object-oriented Spatio-Temporal Reasoning for Video
  Question Answering
Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering
Long Hoang Dang
T. Le
Vuong Le
T. Tran
25
60
0
25 Jun 2021
Attend What You Need: Motion-Appearance Synergistic Networks for Video
  Question Answering
Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering
Ahjeong Seo
Gi-Cheon Kang
J. Park
Byoung-Tak Zhang
13
53
0
19 Jun 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
31
440
0
18 May 2021
Relation-aware Hierarchical Attention Framework for Video Question
  Answering
Relation-aware Hierarchical Attention Framework for Video Question Answering
Fangtao Li
Ting Bai
Chenyu Cao
Zihe Liu
C. Yan
Bin Wu
37
14
0
13 May 2021
Bridge to Answer: Structure-aware Graph Interaction Network for Video
  Question Answering
Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering
Jungin Park
Jiyoung Lee
K. Sohn
157
100
0
29 Apr 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language
  Tasks
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
Hung Le
Nancy F. Chen
S. Hoi
MLLM
18
19
0
16 Apr 2021
Object-Centric Representation Learning for Video Question Answering
Object-Centric Representation Learning for Video Question Answering
Long Hoang Dang
T. Le
Vuong Le
T. Tran
21
7
0
12 Apr 2021
Natural Language Video Localization: A Revisit in Span-based Question
  Answering Framework
Natural Language Video Localization: A Revisit in Span-based Question Answering Framework
Hao Zhang
Aixin Sun
Wei Jing
Liangli Zhen
Joey Tianyi Zhou
Rick Siow Mong Goh
111
84
0
26 Feb 2021
Temporal Memory Attention for Video Semantic Segmentation
Temporal Memory Attention for Video Semantic Segmentation
Hao Wang
Weining Wang
Jing Liu
VOS
29
66
0
17 Feb 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse
  Sampling
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Mohit Bansal
Jingjing Liu
CLIP
32
646
0
11 Feb 2021
Recent Advances in Video Question Answering: A Review of Datasets and
  Methods
Recent Advances in Video Question Answering: A Review of Datasets and Methods
Devshree Patel
Ratnam Parikh
Yesha Shastri
11
18
0
15 Jan 2021
End-to-End Video Question-Answer Generation with Generator-Pretester
  Network
End-to-End Video Question-Answer Generation with Generator-Pretester Network
Hung-Ting Su
Chen-Hsi Chang
Po-Wei Shen
Yu-Siang Wang
Ya-Liang Chang
Yu-Cheng Chang
Pu-Jen Cheng
Winston H. Hsu
27
31
0
05 Jan 2021
Trying Bilinear Pooling in Video-QA
Trying Bilinear Pooling in Video-QA
T. Winterbottom
S. Xiao
A. McLean
Noura Al Moubayed
17
3
0
18 Dec 2020
Look Before you Speak: Visually Contextualized Utterances
Look Before you Speak: Visually Contextualized Utterances
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
19
66
0
10 Dec 2020
Learning to Respond with Your Favorite Stickers: A Framework of Unifying
  Multi-Modality and User Preference in Multi-Turn Dialog
Learning to Respond with Your Favorite Stickers: A Framework of Unifying Multi-Modality and User Preference in Multi-Turn Dialog
Shen Gao
Xiuying Chen
Li Liu
Dongyan Zhao
Rui Yan
14
14
0
05 Nov 2020
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded
  Dialogues
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues
Hung Le
Doyen Sahoo
Nancy F. Chen
S. Hoi
38
30
0
20 Oct 2020
Hierarchical Conditional Relation Networks for Multimodal Video Question
  Answering
Hierarchical Conditional Relation Networks for Multimodal Video Question Answering
T. Le
Vuong Le
Svetha Venkatesh
T. Tran
BDL
10
22
0
18 Oct 2020
Data augmentation techniques for the Video Question Answering task
Data augmentation techniques for the Video Question Answering task
Alex Falcon
O. Lanz
G. Serra
EgoV
8
3
0
22 Aug 2020
Location-aware Graph Convolutional Networks for Video Question Answering
Location-aware Graph Convolutional Networks for Video Question Answering
Deng Huang
Peihao Chen
Runhao Zeng
Qing Du
Mingkui Tan
Chuang Gan
GNN
BDL
9
172
0
07 Aug 2020
Modality Shifting Attention Network for Multi-modal Video Question
  Answering
Modality Shifting Attention Network for Multi-modal Video Question Answering
Junyeong Kim
Minuk Ma
T. Pham
Kyungsu Kim
Chang-Dong Yoo
10
72
0
04 Jul 2020
Structured Multimodal Attentions for TextVQA
Structured Multimodal Attentions for TextVQA
Chenyu Gao
Qi Zhu
Peng Wang
Hui Li
Yuliang Liu
A. Hengel
Qi Wu
10
60
0
01 Jun 2020
Character Matters: Video Story Understanding with Character-Aware
  Relations
Character Matters: Video Story Understanding with Character-Aware Relations
Shijie Geng
Ji Zhang
Zuohui Fu
Peng Gao
Hang Zhang
Gerard de Melo
18
11
0
09 May 2020
Span-based Localizing Network for Natural Language Video Localization
Span-based Localizing Network for Natural Language Video Localization
Hao Zhang
Aixin Sun
Wei Jing
Joey Tianyi Zhou
15
311
0
29 Apr 2020
Video Object Grounding using Semantic Roles in Language Description
Video Object Grounding using Semantic Roles in Language Description
Arka Sadhu
Kan Chen
Ram Nevatia
13
48
0
24 Mar 2020
Previous
123
Next