Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.06676
Cited By
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
18 May 2017
H. Ben-younes
Rémi Cadène
Matthieu Cord
Nicolas Thome
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MUTAN: Multimodal Tucker Fusion for Visual Question Answering"
50 / 272 papers shown
Title
Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering
Ruixue Tang
Chao Ma
W. Zhang
Qi Wu
Xiaokang Yang
OOD
21
48
0
19 Jul 2020
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder
K. Gouthaman
Anurag Mittal
42
78
0
13 Jul 2020
Improving VQA and its Explanations \\ by Comparing Competing Explanations
Jialin Wu
Liyan Chen
Raymond J. Mooney
FAtt
AAML
33
17
0
28 Jun 2020
Overcoming Statistical Shortcuts for Open-ended Visual Counting
Corentin Dancette
Rémi Cadène
Xinlei Chen
Matthieu Cord
8
3
0
17 Jun 2020
Generalising Recursive Neural Models by Tensor Decomposition
Daniele Castellana
D. Bacciu
12
4
0
17 Jun 2020
Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering
Zihao Zhu
J. Yu
Yujing Wang
Yajing Sun
Yue Hu
Qi Wu
17
125
0
16 Jun 2020
Counterfactual VQA: A Cause-Effect Look at Language Bias
Yulei Niu
Kaihua Tang
Hanwang Zhang
Zhiwu Lu
Xiansheng Hua
Ji-Rong Wen
CML
36
394
0
08 Jun 2020
M2P2: Multimodal Persuasion Prediction using Adaptive Fusion
Chongyang Bai
Haipeng Chen
Srijan Kumar
J. Leskovec
V. S. Subrahmanian
12
10
0
03 Jun 2020
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer
Vladimir E. Iashin
Esa Rahtu
14
126
0
17 May 2020
A novel multimodal approach for hybrid brain-computer interface
Zhe Sun
Zihao Huang
F. Duan
Yu Liu
13
40
0
25 Apr 2020
Deep Multimodal Neural Architecture Search
Zhou Yu
Yuhao Cui
Jun-chen Yu
Meng Wang
Dacheng Tao
Qi Tian
11
98
0
25 Apr 2020
MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond
Duy-Kien Nguyen
Vedanuj Goswami
Xinlei Chen
25
23
0
24 Apr 2020
Rephrasing visual questions by specifying the entropy of the answer distribution
K. Terao
Toru Tamaki
B. Raytchev
K. Kaneda
S. Satoh
OOD
19
2
0
10 Apr 2020
Query-controllable Video Summarization
Jia-Hong Huang
M. Worring
6
46
0
07 Apr 2020
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
Zhicheng Huang
Zhaoyang Zeng
Bei Liu
Dongmei Fu
Jianlong Fu
ViT
30
434
0
02 Apr 2020
Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text
Difei Gao
Ke Li
Ruiping Wang
Shiguang Shan
Xilin Chen
12
111
0
31 Mar 2020
GPS-Net: Graph Property Sensing Network for Scene Graph Generation
Xin Lin
Changxing Ding
Jinquan Zeng
Dacheng Tao
31
277
0
29 Mar 2020
CurlingNet: Compositional Learning between Images and Text for Fashion IQ Data
Youngjae Yu
Seunghwan Lee
Yuncheol Choi
Gunhee Kim
CoGe
11
37
0
27 Mar 2020
Linguistically Driven Graph Capsule Network for Visual Question Reasoning
Qingxing Cao
Xiaodan Liang
Keze Wang
Liang Lin
GNN
13
3
0
23 Mar 2020
RSVQA: Visual Question Answering for Remote Sensing Data
Sylvain Lobry
Diego Marcos
J. Murray
D. Tuia
64
205
0
16 Mar 2020
A Question-Centric Model for Visual Question Answering in Medical Imaging
Minh H. Vu
Tommy Löfstedt
T. Nyholm
Raphael Sznitman
MedIm
6
59
0
02 Mar 2020
Tensor Decompositions in Deep Learning
D. Bacciu
Danilo P. Mandic
25
14
0
26 Feb 2020
CQ-VQA: Visual Question Answering on Categorized Questions
Aakansha Mishra
A. Anand
Prithwijit Guha
25
6
0
17 Feb 2020
Sparse and Structured Visual Attention
Pedro Henrique Martins
S. Becker
Zita Marinho
Michael Arens
27
8
0
13 Feb 2020
Component Analysis for Visual Question Answering Architectures
Camila Kolling
Jonatas Wehrmann
Rodrigo C. Barros
CoGe
13
2
0
12 Feb 2020
H-OWAN: Multi-distorted Image Restoration with Tensor 1x1 Convolution
Zihao Huang
Chao Li
Feng Duan
Qibin Zhao
14
5
0
29 Jan 2020
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
M. Farazi
Salman H. Khan
Nick Barnes
21
17
0
20 Jan 2020
Modality-Balanced Models for Visual Dialogue
Hyounghun Kim
Hao Tan
Mohit Bansal
20
27
0
17 Jan 2020
Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features
Andrés Mafla
S. Dey
Ali Furkan Biten
Lluís Gómez
Dimosthenis Karatzas
6
26
0
14 Jan 2020
Visual Question Answering on 360° Images
Shih-Han Chou
Wei-Lun Chao
Wei-Sheng Lai
Min Sun
Ming-Hsuan Yang
14
20
0
10 Jan 2020
Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
19
14
0
06 Dec 2019
Deep Bayesian Active Learning for Multiple Correct Outputs
Khaled Jedoui
Ranjay Krishna
Michael S. Bernstein
Li Fei-Fei
BDL
OOD
UQCV
14
14
0
02 Dec 2019
Assessing the Robustness of Visual Question Answering Models
Jia-Hong Huang
Modar Alfadly
Bernard Ghanem
M. Worring
AAML
OOD
15
23
0
30 Nov 2019
MMTM: Multimodal Transfer Module for CNN Fusion
Hamid Reza Vaezi Joze
Amirreza Shaban
Michael L. Iuzzolino
K. Koishida
18
276
0
20 Nov 2019
Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA
Badri N. Patro
Anupriy
Vinay P. Namboodiri
AAML
FAtt
34
26
0
19 Nov 2019
Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA
Ronghang Hu
Amanpreet Singh
Trevor Darrell
Marcus Rohrbach
18
195
0
14 Nov 2019
Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation
Yiming Xu
Lin Chen
Zhongwei Cheng
Lixin Duan
Jiebo Luo
OOD
24
24
0
11 Nov 2019
Multimodal Intelligence: Representation Learning, Information Fusion, and Applications
Chao Zhang
Zichao Yang
Xiaodong He
Li Deng
HAI
AI4TS
27
320
0
10 Nov 2019
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
A. Schwing
LRM
ReLM
26
9
0
31 Oct 2019
Heterogeneous Graph Learning for Visual Commonsense Reasoning
Weijiang Yu
Jingwen Zhou
Weihao Yu
Xiaodan Liang
Nong Xiao
LRM
17
46
0
25 Oct 2019
KnowIT VQA: Answering Knowledge-Based Questions about Videos
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
9
76
0
23 Oct 2019
Enforcing Reasoning in Visual Commonsense Reasoning
Hammad A. Ayyubi
Md. Mehrab Tanjim
D. Kriegman
ReLM
OOD
14
2
0
21 Oct 2019
Multi-modal Deep Analysis for Multimedia
Wenwu Zhu
Xin Eric Wang
Hongzhi Li
19
38
0
11 Oct 2019
REMIND Your Neural Network to Prevent Catastrophic Forgetting
Tyler L. Hayes
Kushal Kafle
Robik Shrestha
Manoj Acharya
Christopher Kanan
CLL
29
294
0
06 Oct 2019
LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval
Reuben Tan
Huijuan Xu
Kate Saenko
Bryan A. Plummer
15
67
0
27 Sep 2019
Compact Trilinear Interaction for Visual Question Answering
Tuong Khanh Long Do
Thanh-Toan Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
28
59
0
26 Sep 2019
Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network
Qingxing Cao
Bailin Li
Xiaodan Liang
Liang Lin
25
13
0
23 Sep 2019
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
Tanzila Rahman
Bicheng Xu
Leonid Sigal
19
77
0
22 Sep 2019
Inverse Visual Question Answering with Multi-Level Attentions
Yaser Alwatter
Yuhong Guo
BDL
14
1
0
17 Sep 2019
Probabilistic framework for solving Visual Dialog
Badri N. Patro
Anupriy
Vinay P. Namboodiri
BDL
22
13
0
11 Sep 2019
Previous
1
2
3
4
5
6
Next