Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1708.01471
Cited By
Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering
4 August 2017
Zhou Yu
Jun-chen Yu
Jianping Fan
Dacheng Tao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering"
50 / 214 papers shown
Title
Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features
Andrés Mafla
S. Dey
Ali Furkan Biten
Lluís Gómez
Dimosthenis Karatzas
8
26
0
14 Jan 2020
In Defense of Grid Features for Visual Question Answering
Huaizu Jiang
Ishan Misra
Marcus Rohrbach
Erik Learned-Miller
Xinlei Chen
OOD
ObjD
21
318
0
10 Jan 2020
Low Rank Factorization for Compact Multi-Head Self-Attention
Sneha Mehta
Huzefa Rangwala
Naren Ramakrishnan
25
5
0
26 Nov 2019
Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs
Van-Quang Nguyen
Masanori Suganuma
Takayuki Okatani
16
7
0
26 Nov 2019
Two Causal Principles for Improving Visual Dialog
Jiaxin Qi
Yulei Niu
Jianqiang Huang
Hanwang Zhang
CML
16
146
0
24 Nov 2019
Unsupervised Keyword Extraction for Full-sentence VQA
Kohei Uehara
Tatsuya Harada
14
1
0
23 Nov 2019
Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication
Ruize Wang
Zhongyu Wei
Ying Cheng
Piji Li
Haijun Shan
Ji Zhang
Qi Zhang
Xuanjing Huang
VGen
DiffM
15
13
0
11 Nov 2019
Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation
Yiming Xu
Lin Chen
Zhongwei Cheng
Lixin Duan
Jiebo Luo
OOD
24
24
0
11 Nov 2019
Multimodal Intelligence: Representation Learning, Information Fusion, and Applications
Chao Zhang
Zichao Yang
Xiaodong He
Li Deng
HAI
AI4TS
27
320
0
10 Nov 2019
Two-Headed Monster And Crossed Co-Attention Networks
Yaoyiran Li
Jing Jiang
19
0
0
10 Nov 2019
Low-Rank HOCA: Efficient High-Order Cross-Modal Attention for Video Captioning
Tao Jin
Siyu Huang
Yingming Li
Zhongfei Zhang
12
20
0
01 Nov 2019
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
A. Schwing
LRM
ReLM
26
9
0
31 Oct 2019
Multi-modal Deep Analysis for Multimedia
Wenwu Zhu
Xin Eric Wang
Hongzhi Li
19
38
0
11 Oct 2019
Meta Module Network for Compositional Visual Reasoning
Wenhu Chen
Zhe Gan
Linjie Li
Yu Cheng
W. Wang
Jingjing Liu
LRM
17
68
0
08 Oct 2019
Compact Trilinear Interaction for Visual Question Answering
Tuong Khanh Long Do
Thanh-Toan Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
28
59
0
26 Sep 2019
Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network
Qingxing Cao
Bailin Li
Xiaodan Liang
Liang Lin
25
13
0
23 Sep 2019
Inverse Visual Question Answering with Multi-Level Attentions
Yaser Alwatter
Yuhong Guo
BDL
19
1
0
17 Sep 2019
Phrase Grounding by Soft-Label Chain Conditional Random Field
Jiacheng Liu
J. Hockenmaier
10
10
0
01 Sep 2019
Attention-based Fusion for Outfit Recommendation
Katrien Laenen
Marie-Francine Moens
CVBM
12
7
0
28 Aug 2019
Mobile Video Action Recognition
Yuqi Huo
Xiaoli Xu
Yao Lu
Yulei Niu
Zhiwu Lu
Ji-Rong Wen
17
14
0
27 Aug 2019
Zero-Shot Grounding of Objects from Natural Language Queries
Arka Sadhu
Kan Chen
Ram Nevatia
ObjD
28
156
0
20 Aug 2019
Mixed High-Order Attention Network for Person Re-Identification
Binghui Chen
Weihong Deng
Jiani Hu
CVBM
9
353
0
16 Aug 2019
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
19
38
0
12 Aug 2019
Bilinear Graph Networks for Visual Question Answering
Dalu Guo
Chang Xu
Dacheng Tao
GNN
27
50
0
23 Jul 2019
The Resale Price Prediction of Secondhand Jewelry Items Using a Multi-modal Deep Model with Iterative Co-Attention
Yusuke Yamaura
Nobuya Kanemaki
Y. Tsuboshita
10
3
0
01 Jul 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
13
796
0
25 Jun 2019
Audio-Visual Kinship Verification
Xiaoting Wu
Eric Granger
Xiaoyi Feng
CVBM
14
3
0
24 Jun 2019
Improving Visual Question Answering by Referring to Generated Paragraph Captions
Hyounghun Kim
Mohit Bansal
CoGe
11
20
0
14 Jun 2019
Relationship-Embedded Representation Learning for Grounding Referring Expressions
Sibei Yang
Guanbin Li
Yizhou Yu
ObjD
25
52
0
11 Jun 2019
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering
Zhou Yu
D. Xu
Jun-chen Yu
Ting Yu
Zhou Zhao
Yueting Zhuang
Dacheng Tao
8
434
0
06 Jun 2019
Frontal Low-rank Random Tensors for Fine-grained Action Segmentation
Yan Zhang
Krikamol Muandet
Qianli Ma
Heiko Neumann
Siyu Tang
26
3
0
03 Jun 2019
Multimodal Transformer with Multi-View Visual Representation for Image Captioning
Jun-chen Yu
Jing Li
Zhou Yu
Qingming Huang
ViT
11
374
0
20 May 2019
Quantifying and Alleviating the Language Prior Problem in Visual Question Answering
Yangyang Guo
Zhiyong Cheng
Liqiang Nie
Y. Liu
Yinglong Wang
Mohan S. Kankanhalli
14
36
0
13 May 2019
HAR-Net: Joint Learning of Hybrid Attention for Single-stage Object Detection
Yali Li
Shengjin Wang
22
32
0
25 Apr 2019
Progressive Attention Memory Network for Movie Story Question Answering
Junyeong Kim
Minuk Ma
Kyungsu Kim
Sungjin Kim
Chang-Dong Yoo
11
76
0
18 Apr 2019
Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-based Image Retrieval
Anjan Dutta
Zeynep Akata
33
143
0
08 Mar 2019
Image-Question-Answer Synergistic Network for Visual Dialog
Dalu Guo
Chang Xu
Dacheng Tao
6
74
0
26 Feb 2019
MUREL: Multimodal Relational Reasoning for Visual Question Answering
Rémi Cadène
H. Ben-younes
Matthieu Cord
Nicolas Thome
LRM
19
271
0
25 Feb 2019
Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog
Zhe Gan
Yu Cheng
Ahmed El Kholy
Linjie Li
Jingjing Liu
Jianfeng Gao
11
104
0
01 Feb 2019
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection
H. Ben-younes
Rémi Cadène
Nicolas Thome
Matthieu Cord
14
218
0
31 Jan 2019
Deep Fusion: An Attention Guided Factorized Bilinear Pooling for Audio-video Emotion Recognition
Yuanyuan Zhang
Zirui Wang
Jun Du
11
31
0
15 Jan 2019
Local Temporal Bilinear Pooling for Fine-grained Action Parsing
Yan Zhang
Siyu Tang
Krikamol Muandet
Christian Jarvers
Heiko Neumann
13
21
0
05 Dec 2018
Generating Easy-to-Understand Referring Expressions for Target Identifications
Mikihiro Tanaka
Takayuki Itamochi
Kenichi Narioka
Ikuro Sato
Yoshitaka Ushiku
Tatsuya Harada
8
1
0
29 Nov 2018
Visual Question Answering as Reading Comprehension
Hui Li
Peng Wang
Chunhua Shen
A. Hengel
9
40
0
29 Nov 2018
VQA with no questions-answers training
B. Vatashsky
S. Ullman
33
12
0
20 Nov 2018
EA-LSTM: Evolutionary Attention-based LSTM for Time Series Prediction
Youru Li
Zhenfeng Zhu
Deqiang Kong
Jinhyuk Lee
Yao Zhao
AI4TS
23
354
0
09 Nov 2018
Textbook Question Answering with Multi-modal Context Graph Understanding and Self-supervised Open-set Comprehension
Daesik Kim
Seonhoon Kim
Nojun Kwak
9
2
0
01 Nov 2018
Understand, Compose and Respond - Answering Visual Questions by a Composition of Abstract Procedures
B. Vatashsky
S. Ullman
CoGe
18
1
0
25 Oct 2018
Towards Good Practices for Multi-modal Fusion in Large-scale Video Classification
Jinlai Liu
Zehuan Yuan
Changhu Wang
16
9
0
16 Sep 2018
Interpretable Visual Question Answering by Reasoning on Dependency Trees
Qingxing Cao
Bailin Li
Xiaodan Liang
Liang Lin
25
55
0
06 Sep 2018
Previous
1
2
3
4
5
Next