ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.01017
  4. Cited By
Structured Two-stream Attention Network for Video Question Answering

Structured Two-stream Attention Network for Video Question Answering

2 June 2022
Lianli Gao
Pengpeng Zeng
Jingkuan Song
Yuan-Fang Li
Wu Liu
Tao Mei
Heng Tao Shen
ArXivPDFHTML

Papers citing "Structured Two-stream Attention Network for Video Question Answering"

21 / 21 papers shown
Title
Cross-modal Causal Relation Alignment for Video Question Grounding
Weixing Chen
Y. Liu
Binglin Chen
Jiandong Su
Yongsen Zheng
Liang Lin
BDL
VGen
CML
41
2
0
05 Mar 2025
Align and Aggregate: Compositional Reasoning with Video Alignment and
  Answer Aggregation for Video Question-Answering
Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering
Zhaohe Liao
Jiangtong Li
Li Niu
Liqing Zhang
CoGe
35
3
0
03 Jul 2024
Continual Referring Expression Comprehension via Dual Modular
  Memorization
Continual Referring Expression Comprehension via Dual Modular Memorization
Hengtao Shen
Cheng Chen
Peng Wang
Lianli Gao
M. Wang
Jingkuan Song
ObjD
25
3
0
25 Nov 2023
Transform-Equivariant Consistency Learning for Temporal Sentence
  Grounding
Transform-Equivariant Consistency Learning for Temporal Sentence Grounding
Daizong Liu
Xiaoye Qu
Jianfeng Dong
Pan Zhou
Zichuan Xu
Haozhao Wang
Xing Di
Weining Lu
Yu Cheng
44
8
0
06 May 2023
Efficient End-to-End Video Question Answering with Pyramidal Multimodal
  Transformer
Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer
Min Peng
Chongyang Wang
Yu Shi
Xiang-Dong Zhou
ViT
42
7
0
04 Feb 2023
Visual Commonsense-aware Representation Network for Video Captioning
Visual Commonsense-aware Representation Network for Video Captioning
Pengpeng Zeng
Haonan Zhang
Lianli Gao
Xiangpeng Li
Jin Qian
Hengtao Shen
16
16
0
17 Nov 2022
Locate before Answering: Answer Guided Question Localization for Video
  Question Answering
Locate before Answering: Answer Guided Question Localization for Video Question Answering
Tianwen Qian
Ran Cui
Jingjing Chen
Pai Peng
Xiao-Wei Guo
Yu-Gang Jiang
10
17
0
05 Oct 2022
WildQA: In-the-Wild Video Question Answering
WildQA: In-the-Wild Video Question Answering
Santiago Castro
Naihao Deng
Pingxuan Huang
Mihai Burzo
Rada Mihalcea
68
7
0
14 Sep 2022
Rethinking Multi-Modal Alignment in Video Question Answering from
  Feature and Sample Perspectives
Rethinking Multi-Modal Alignment in Video Question Answering from Feature and Sample Perspectives
Shaoning Xiao
Long Chen
Kaifeng Gao
Zhao Wang
Yi Yang
Zhimeng Zhang
Jun Xiao
6
5
0
25 Apr 2022
Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for
  Temporal Sentence Grounding
Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for Temporal Sentence Grounding
Daizong Liu
Xiang Fang
Wei Hu
Pan Zhou
13
37
0
06 Mar 2022
Exploring Motion and Appearance Information for Temporal Sentence
  Grounding
Exploring Motion and Appearance Information for Temporal Sentence Grounding
Daizong Liu
Xiaoye Qu
Pan Zhou
Yang Liu
19
41
0
03 Jan 2022
Video as Conditional Graph Hierarchy for Multi-Granular Question
  Answering
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering
Junbin Xiao
Angela Yao
Zhiyuan Liu
Yicong Li
Wei Ji
Tat-Seng Chua
23
111
0
12 Dec 2021
Temporal Pyramid Transformer with Multimodal Interaction for Video
  Question Answering
Temporal Pyramid Transformer with Multimodal Interaction for Video Question Answering
Min Peng
Chongyang Wang
Yuan Gao
Yu Shi
Xiangdong Zhou
32
3
0
10 Sep 2021
Attend What You Need: Motion-Appearance Synergistic Networks for Video
  Question Answering
Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering
Ahjeong Seo
Gi-Cheon Kang
J. Park
Byoung-Tak Zhang
13
52
0
19 Jun 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language
  Tasks
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
Hung Le
Nancy F. Chen
S. Hoi
MLLM
11
19
0
16 Apr 2021
Recent Advances in Video Question Answering: A Review of Datasets and
  Methods
Recent Advances in Video Question Answering: A Review of Datasets and Methods
Devshree Patel
Ratnam Parikh
Yesha Shastri
6
17
0
15 Jan 2021
Trying Bilinear Pooling in Video-QA
Trying Bilinear Pooling in Video-QA
T. Winterbottom
S. Xiao
A. McLean
Noura Al Moubayed
8
3
0
18 Dec 2020
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded
  Dialogues
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues
Hung Le
Doyen Sahoo
Nancy F. Chen
S. Hoi
38
30
0
20 Oct 2020
ORD: Object Relationship Discovery for Visual Dialogue Generation
ORD: Object Relationship Discovery for Visual Dialogue Generation
Ziwei Wang
Zi Huang
Yadan Luo
Huimin Lu
11
4
0
15 Jun 2020
DramaQA: Character-Centered Video Story Understanding with Hierarchical
  QA
DramaQA: Character-Centered Video Story Understanding with Hierarchical QA
Seongho Choi
Kyoung-Woon On
Y. Heo
Ahjeong Seo
Youwon Jang
Minsu Lee
Byoung-Tak Zhang
10
51
0
07 May 2020
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
144
1,464
0
06 Jun 2016
1