Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1804.00775
Cited By
v1
v2 (latest)
Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
3 April 2018
Duy-Kien Nguyen
Takayuki Okatani
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering"
50 / 102 papers shown
Title
Visual Question Answering based on Local-Scene-Aware Referring Expression Generation
Neural Networks (NN), 2021
Jungjun Kim
Dong-Gyu Lee
Jialin Wu
Hong G Jung
Seong-Whan Lee
ObjD
149
23
0
22 Jan 2021
End-to-End Object Detection with Adaptive Clustering Transformer
British Machine Vision Conference (BMVC), 2020
Minghang Zheng
Shiyang Feng
Renrui Zhang
Kunchang Li
Xiaogang Wang
Jiaming Song
Hao Dong
ViT
287
222
0
18 Nov 2020
An Improved Attention for Visual Question Answering
Tanzila Rahman
Shih-Han Chou
Leonid Sigal
Giuseppe Carenini
119
55
0
04 Nov 2020
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
245
6
0
19 Oct 2020
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
Neurocomputing (Neurocomputing), 2020
Wei Chen
Weiping Wang
Tianpeng Liu
M. Lew
VLM
297
35
0
16 Oct 2020
Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering
Tuong Khanh Long Do
Binh X. Nguyen
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
Thanh-Toan Do
123
2
0
23 Sep 2020
Multi-Task Learning with Deep Neural Networks: A Survey
M. Crawshaw
CVBM
406
708
0
10 Sep 2020
Co-Saliency Detection with Co-Attention Fully Convolutional Network
Guangshuai Gao
Wenting Zhao
Qingjie Liu
Yunhong Wang
125
33
0
20 Aug 2020
DCR-Net: A Deep Co-Interactive Relation Network for Joint Dialog Act Recognition and Sentiment Classification
Libo Qin
Wanxiang Che
Yangming Li
Minheng Ni
Ting Liu
194
99
0
16 Aug 2020
Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention
Bin Duan
Hao Tang
Wei Wang
Ziliang Zong
Guowei Yang
Yan Yan
137
72
0
14 Aug 2020
Location-aware Graph Convolutional Networks for Video Question Answering
AAAI Conference on Artificial Intelligence (AAAI), 2020
Deng Huang
Peihao Chen
Runhao Zeng
Qing Du
Zhuliang Yu
Chuang Gan
GNN
BDL
193
184
0
07 Aug 2020
SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space
Workshop on Document-grounded Dialogue and Conversational Question Answering (DialDoc), 2020
Liu Yang
VLM
139
5
0
02 Aug 2020
REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering
International Conference on Neural Information Processing (ICONIP), 2020
Siwen Luo
S. Han
Kaiyuan Sun
Josiah Poon
CoGe
LRM
ReLM
160
4
0
27 Jul 2020
Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Su
Zhengkai Jiang
Shiyang Feng
Zuohui Fu
Gerard de Melo
Sen Su
VLM
SSL
CLIP
144
29
0
26 Jul 2020
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder
European Conference on Computer Vision (ECCV), 2020
K. Gouthaman
Anurag Mittal
261
87
0
13 Jul 2020
Alleviating the Burden of Labeling: Sentence Generation by Attention Branch Encoder-Decoder Network
IEEE Robotics and Automation Letters (RA-L), 2020
Tadashi Ogura
A. Magassouba
K. Sugiura
Tsubasa Hirakawa
Takayoshi Yamashita
H. Fujiyoshi
Hisashi Kawai
97
11
0
09 Jul 2020
Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation
Guolei Sun
Wenguan Wang
Jifeng Dai
Luc Van Gool
454
345
0
03 Jul 2020
Self-Segregating and Coordinated-Segregating Transformer for Focused Deep Multi-Modular Network for Visual Question Answering
C. Sur
82
9
0
25 Jun 2020
History for Visual Dialog: Do we really need it?
Shubham Agarwal
Trung Bui
Joon-Young Lee
Ioannis Konstas
Verena Rieser
VLM
105
73
0
08 May 2020
Deep Multimodal Neural Architecture Search
ACM Multimedia (ACM MM), 2020
Zhou Yu
Yuhao Cui
Jun-chen Yu
Meng Wang
Dacheng Tao
Qi Tian
149
107
0
25 Apr 2020
MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond
Duy-Kien Nguyen
Vedanuj Goswami
Xinlei Chen
137
23
0
24 Apr 2020
An Entropy Clustering Approach for Assessing Visual Question Difficulty
IEEE Access (IEEE Access), 2020
K. Terao
Toru Tamaki
B. Raytchev
K. Kaneda
Shuníchi Satoh
OOD
AAML
205
1
0
12 Apr 2020
CQ-VQA: Visual Question Answering on Categorized Questions
IEEE International Joint Conference on Neural Network (IJCNN), 2020
Aakansha Mishra
A. Anand
Prithwijit Guha
208
8
0
17 Feb 2020
See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks
Computer Vision and Pattern Recognition (CVPR), 2019
Xiankai Lu
Wenguan Wang
Chao Ma
Jianbing Shen
Ling Shao
Fatih Porikli
VOS
195
518
0
19 Jan 2020
Modality-Balanced Models for Visual Dialogue
AAAI Conference on Artificial Intelligence (AAAI), 2020
Hyounghun Kim
Hao Tan
Joey Tianyi Zhou
93
29
0
17 Jan 2020
Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Lei Shi
Shijie Geng
Kai Shuang
Chiori Hori
Songxiang Liu
Shiyang Feng
Sen Su
225
12
0
03 Jan 2020
A Multimodal Target-Source Classifier with Attention Branches to Understand Ambiguous Instructions for Fetching Daily Objects
IEEE Robotics and Automation Letters (RA-L), 2019
A. Magassouba
K. Sugiura
Hisashi Kawai
116
10
0
23 Dec 2019
Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs
Van-Quang Nguyen
Masanori Suganuma
Takayuki Okatani
223
7
0
26 Nov 2019
Meta Module Network for Compositional Visual Reasoning
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2019
Wenhu Chen
Zhe Gan
Linjie Li
Yu Cheng
Wenjie Wang
Jingjing Liu
LRM
253
75
0
08 Oct 2019
Compact Trilinear Interaction for Visual Question Answering
IEEE International Conference on Computer Vision (ICCV), 2019
Tuong Khanh Long Do
Thanh-Toan Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
102
63
0
26 Sep 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
IEEE International Conference on Computer Vision (ICCV), 2019
Zihao Wang
Xihui Liu
Jiaming Song
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
264
336
0
12 Sep 2019
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
221
44
0
12 Aug 2019
Multi-modality Latent Interaction Network for Visual Question Answering
IEEE International Conference on Computer Vision (ICCV), 2019
Shiyang Feng
Haoxuan You
Zhanpeng Zhang
Xiaogang Wang
Jiaming Song
139
85
0
10 Aug 2019
The Resale Price Prediction of Secondhand Jewelry Items Using a Multi-modal Deep Model with Iterative Co-Attention
Yusuke Yamaura
Nobuya Kanemaki
Y. Tsuboshita
142
5
0
01 Jul 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Computer Vision and Pattern Recognition (CVPR), 2019
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
269
912
0
25 Jun 2019
Multimodal Transformer with Multi-View Visual Representation for Image Captioning
Jun-chen Yu
Jing Li
Zhou Yu
Qingming Huang
ViT
160
422
0
20 May 2019
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations
Neural Information Processing Systems (NeurIPS), 2019
Fenglin Liu
Yuanxin Liu
Xuancheng Ren
Xiaodong He
Xu Sun
VLM
154
90
0
15 May 2019
Quantifying and Alleviating the Language Prior Problem in Visual Question Answering
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2019
Yangyang Guo
Zhiyong Cheng
Liqiang Nie
Zichen Liu
Yinglong Wang
Mohan Kankanhalli
166
37
0
13 May 2019
Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts
Julia Kruk
Jonah Lubin
Karan Sikka
Xiaoyu Lin
Dan Jurafsky
Ajay Divakaran
250
106
0
19 Apr 2019
What Object Should I Use? - Task Driven Object Detection
Johann Sawatzky
Yaser Souri
C. Grund
Juergen Gall
ObjD
147
32
0
05 Apr 2019
Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing
Computer Vision and Pattern Recognition (CVPR), 2019
Xihui Liu
Zihao Wang
Jing Shao
Xiaogang Wang
Jiaming Song
ObjD
228
211
0
03 Mar 2019
Answer Them All! Toward Universal Visual Question Answering Models
Computer Vision and Pattern Recognition (CVPR), 2019
Robik Shrestha
Kushal Kafle
Christopher Kanan
262
86
0
01 Mar 2019
Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering
Shiyang Feng
Zhengkai Jiang
Haoxuan You
Pan Lu
Steven C. H. Hoi
Xiaogang Wang
Jiaming Song
AIMat
400
392
0
13 Dec 2018
Attention-based Adaptive Selection of Operations for Image Restoration in the Presence of Unknown Combined Distortions
Masanori Suganuma
Xing Liu
Takayuki Okatani
212
90
0
03 Dec 2018
Multi-task Learning of Hierarchical Vision-Language Representation
Duy-Kien Nguyen
Takayuki Okatani
208
56
0
03 Dec 2018
Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding
Hassan Akbari
Svebor Karaman
Surabhi Bhargava
Brian Chen
Carl Vondrick
Shih-Fu Chang
132
86
0
28 Nov 2018
LSTA: Long Short-Term Attention for Egocentric Action Recognition
Swathikiran Sudhakaran
Sergio Escalera
Oswald Lanz
EgoV
191
151
0
26 Nov 2018
VQA with no questions-answers training
Computer Vision and Pattern Recognition (CVPR), 2018
B. Vatashsky
S. Ullman
192
13
0
20 Nov 2018
Understand, Compose and Respond - Answering Visual Questions by a Composition of Abstract Procedures
B. Vatashsky
S. Ullman
CoGe
106
2
0
25 Oct 2018
Knowing Where to Look? Analysis on Attention of Visual Question Answering System
Wei Li
Zehuan Yuan
Xiangzhong Fang
Changhu Wang
74
8
0
09 Oct 2018
Previous
1
2
3
Next