Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.01017
Cited By
Structured Two-stream Attention Network for Video Question Answering
2 June 2022
Lianli Gao
Pengpeng Zeng
Jingkuan Song
Yuan-Fang Li
Wu Liu
Tao Mei
Heng Tao Shen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Structured Two-stream Attention Network for Video Question Answering"
21 / 21 papers shown
Title
Cross-modal Causal Relation Alignment for Video Question Grounding
Weixing Chen
Y. Liu
Binglin Chen
Jiandong Su
Yongsen Zheng
Liang Lin
BDL
VGen
CML
41
2
0
05 Mar 2025
Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering
Zhaohe Liao
Jiangtong Li
Li Niu
Liqing Zhang
CoGe
35
3
0
03 Jul 2024
Continual Referring Expression Comprehension via Dual Modular Memorization
Hengtao Shen
Cheng Chen
Peng Wang
Lianli Gao
M. Wang
Jingkuan Song
ObjD
25
3
0
25 Nov 2023
Transform-Equivariant Consistency Learning for Temporal Sentence Grounding
Daizong Liu
Xiaoye Qu
Jianfeng Dong
Pan Zhou
Zichuan Xu
Haozhao Wang
Xing Di
Weining Lu
Yu Cheng
44
8
0
06 May 2023
Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer
Min Peng
Chongyang Wang
Yu Shi
Xiang-Dong Zhou
ViT
42
7
0
04 Feb 2023
Visual Commonsense-aware Representation Network for Video Captioning
Pengpeng Zeng
Haonan Zhang
Lianli Gao
Xiangpeng Li
Jin Qian
Hengtao Shen
16
16
0
17 Nov 2022
Locate before Answering: Answer Guided Question Localization for Video Question Answering
Tianwen Qian
Ran Cui
Jingjing Chen
Pai Peng
Xiao-Wei Guo
Yu-Gang Jiang
10
17
0
05 Oct 2022
WildQA: In-the-Wild Video Question Answering
Santiago Castro
Naihao Deng
Pingxuan Huang
Mihai Burzo
Rada Mihalcea
68
7
0
14 Sep 2022
Rethinking Multi-Modal Alignment in Video Question Answering from Feature and Sample Perspectives
Shaoning Xiao
Long Chen
Kaifeng Gao
Zhao Wang
Yi Yang
Zhimeng Zhang
Jun Xiao
6
5
0
25 Apr 2022
Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for Temporal Sentence Grounding
Daizong Liu
Xiang Fang
Wei Hu
Pan Zhou
13
37
0
06 Mar 2022
Exploring Motion and Appearance Information for Temporal Sentence Grounding
Daizong Liu
Xiaoye Qu
Pan Zhou
Yang Liu
19
41
0
03 Jan 2022
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering
Junbin Xiao
Angela Yao
Zhiyuan Liu
Yicong Li
Wei Ji
Tat-Seng Chua
23
111
0
12 Dec 2021
Temporal Pyramid Transformer with Multimodal Interaction for Video Question Answering
Min Peng
Chongyang Wang
Yuan Gao
Yu Shi
Xiangdong Zhou
32
3
0
10 Sep 2021
Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering
Ahjeong Seo
Gi-Cheon Kang
J. Park
Byoung-Tak Zhang
13
52
0
19 Jun 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
Hung Le
Nancy F. Chen
S. Hoi
MLLM
11
19
0
16 Apr 2021
Recent Advances in Video Question Answering: A Review of Datasets and Methods
Devshree Patel
Ratnam Parikh
Yesha Shastri
6
17
0
15 Jan 2021
Trying Bilinear Pooling in Video-QA
T. Winterbottom
S. Xiao
A. McLean
Noura Al Moubayed
8
3
0
18 Dec 2020
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues
Hung Le
Doyen Sahoo
Nancy F. Chen
S. Hoi
38
30
0
20 Oct 2020
ORD: Object Relationship Discovery for Visual Dialogue Generation
Ziwei Wang
Zi Huang
Yadan Luo
Huimin Lu
11
4
0
15 Jun 2020
DramaQA: Character-Centered Video Story Understanding with Hierarchical QA
Seongho Choi
Kyoung-Woon On
Y. Heo
Ahjeong Seo
Youwon Jang
Minsu Lee
Byoung-Tak Zhang
10
51
0
07 May 2020
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
144
1,464
0
06 Jun 2016
1