ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.06676
  4. Cited By
MUTAN: Multimodal Tucker Fusion for Visual Question Answering

MUTAN: Multimodal Tucker Fusion for Visual Question Answering

18 May 2017
H. Ben-younes
Rémi Cadène
Matthieu Cord
Nicolas Thome
ArXivPDFHTML

Papers citing "MUTAN: Multimodal Tucker Fusion for Visual Question Answering"

50 / 272 papers shown
Title
Interpretable Visual Understanding with Cognitive Attention Network
Interpretable Visual Understanding with Cognitive Attention Network
Xuejiao Tang
Wenbin Zhang
Yi Yu
Kea Turner
Tyler Derr
Mengyu Wang
Eirini Ntoutsi
44
12
0
06 Aug 2021
Ordered Attention for Coherent Visual Storytelling
Ordered Attention for Coherent Visual Storytelling
Tom Braude
Idan Schwartz
A. Schwing
Ariel Shamir
19
9
0
04 Aug 2021
Mapping Vulnerable Populations with AI
Mapping Vulnerable Populations with AI
B. Kellenberger
John E. Vargas-Muñoz
D. Tuia
Rodrigo Caye Daudt
Konrad Schindler
Thao T-T Whelan
Brenda Ayo
Ferda Ofli
Muhammad Imran
14
1
0
29 Jul 2021
Cycled Compositional Learning between Images and Text
Cycled Compositional Learning between Images and Text
Jongseok Kim
Youngjae Yu
Seunghwan Lee
Gunhee Kim
CoGe
7
3
0
24 Jul 2021
MuVAM: A Multi-View Attention-based Model for Medical Visual Question
  Answering
MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering
Haiwei Pan
Shuning He
Kejia Zhang
Bo Qu
Chunling Chen
Kun Shi
12
11
0
07 Jul 2021
Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory
Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory
Xuejiao Tang
Xin Huang
Wenbin Zhang
T. Child
Qiong Hu
Zhen Liu
Ji Zhang
LRM
13
18
0
04 Jul 2021
Probing Inter-modality: Visual Parsing with Self-Attention for
  Vision-Language Pre-training
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Hongwei Xue
Yupan Huang
Bei Liu
Houwen Peng
Jianlong Fu
Houqiang Li
Jiebo Luo
22
88
0
25 Jun 2021
Supervising the Transfer of Reasoning Patterns in VQA
Supervising the Transfer of Reasoning Patterns in VQA
Corentin Kervadec
Christian Wolf
G. Antipov
M. Baccouche
Madiha Nadri Wolf
22
10
0
10 Jun 2021
LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution
  Homography Estimation
LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution Homography Estimation
Ruizhi Shao
Gaochang Wu
Yuemei Zhou
Ying Fu
Yebin Liu
ViT
13
42
0
08 Jun 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
10
55
0
24 May 2021
A Review on Explainability in Multimodal Deep Neural Nets
A Review on Explainability in Multimodal Deep Neural Nets
Gargi Joshi
Rahee Walambe
K. Kotecha
16
137
0
17 May 2021
Cross-Modal Progressive Comprehension for Referring Segmentation
Cross-Modal Progressive Comprehension for Referring Segmentation
Si Liu
Tianrui Hui
Shaofei Huang
Yunchao Wei
Bo-wen Li
Guanbin Li
EgoV
VOS
16
123
0
15 May 2021
Relation-aware Hierarchical Attention Framework for Video Question
  Answering
Relation-aware Hierarchical Attention Framework for Video Question Answering
Fangtao Li
Ting Bai
Chenyu Cao
Zihe Liu
C. Yan
Bin Wu
32
14
0
13 May 2021
A First Look: Towards Explainable TextVQA Models via Visual and Textual
  Explanations
A First Look: Towards Explainable TextVQA Models via Visual and Textual Explanations
Varun Nagaraj Rao
Xingjian Zhen
K. Hovsepian
Mingwei Shen
21
17
0
29 Apr 2021
Dealing with Missing Modalities in the Visual Question Answer-Difference
  Prediction Task through Knowledge Distillation
Dealing with Missing Modalities in the Visual Question Answer-Difference Prediction Task through Knowledge Distillation
Jae-Won Cho
Dong-Jin Kim
Jinsoo Choi
Yunjae Jung
In So Kweon
VLM
16
17
0
13 Apr 2021
How Transferable are Reasoning Patterns in VQA?
How Transferable are Reasoning Patterns in VQA?
Corentin Kervadec
Theo Jaunet
G. Antipov
M. Baccouche
Romain Vuillemot
Christian Wolf
LRM
18
28
0
08 Apr 2021
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in
  Visual Question Answering
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
Corentin Dancette
Rémi Cadène
Damien Teney
Matthieu Cord
CML
26
74
0
07 Apr 2021
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language
  Representation Learning
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Zhicheng Huang
Zhaoyang Zeng
Yupan Huang
Bei Liu
Dongmei Fu
Jianlong Fu
VLM
ViT
34
271
0
07 Apr 2021
RTIC: Residual Learning for Text and Image Composition using Graph
  Convolutional Network
RTIC: Residual Learning for Text and Image Composition using Graph Convolutional Network
Minchul Shin
Yoonjae Cho
ByungSoo Ko
Geonmo Gu
8
44
0
07 Apr 2021
Multi-Modal Answer Validation for Knowledge-Based VQA
Multi-Modal Answer Validation for Knowledge-Based VQA
Jialin Wu
Jiasen Lu
Ashish Sabharwal
Roozbeh Mottaghi
6
139
0
23 Mar 2021
MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks
MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks
Alexandre Ramé
Rémy Sun
Matthieu Cord
UQCV
35
60
0
10 Mar 2021
Select, Substitute, Search: A New Benchmark for Knowledge-Augmented
  Visual Question Answering
Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering
Aman Jain
Mayank Kothyari
Vishwajeet Kumar
P. Jyothi
Ganesh Ramakrishnan
Soumen Chakrabarti
13
34
0
09 Mar 2021
Perspectives and Prospects on Transformer Architecture for Cross-Modal
  Tasks with Language and Vision
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision
Andrew Shin
Masato Ishii
T. Narihira
33
36
0
06 Mar 2021
Causal Attention for Vision-Language Tasks
Causal Attention for Vision-Language Tasks
Xu Yang
Hanwang Zhang
Guojun Qi
Jianfei Cai
CML
23
148
0
05 Mar 2021
Adversarial Text-to-Image Synthesis: A Review
Adversarial Text-to-Image Synthesis: A Review
Stanislav Frolov
Tobias Hinz
Federico Raue
Jörn Hees
Andreas Dengel
EGVM
14
176
0
25 Jan 2021
Visual Question Answering based on Local-Scene-Aware Referring
  Expression Generation
Visual Question Answering based on Local-Scene-Aware Referring Expression Generation
Jungjun Kim
Dong-Gyu Lee
Jialin Wu
Hong G Jung
Seong-Whan Lee
ObjD
11
21
0
22 Jan 2021
Reasoning over Vision and Language: Exploring the Benefits of
  Supplemental Knowledge
Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge
Violetta Shevchenko
Damien Teney
A. Dick
A. Hengel
6
28
0
15 Jan 2021
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain
  Knowledge-Based VQA
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
Kenneth Marino
Xinlei Chen
Devi Parikh
Abhinav Gupta
Marcus Rohrbach
13
179
0
20 Dec 2020
Trying Bilinear Pooling in Video-QA
Trying Bilinear Pooling in Video-QA
T. Winterbottom
S. Xiao
A. McLean
Noura Al Moubayed
17
3
0
18 Dec 2020
Knowledge-Routed Visual Question Reasoning: Challenges for Deep
  Representation Embedding
Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding
Qingxing Cao
Bailin Li
Xiaodan Liang
Keze Wang
Liang Lin
44
36
0
14 Dec 2020
Driving Behavior Explanation with Multi-level Fusion
Driving Behavior Explanation with Multi-level Fusion
H. Ben-younes
Éloi Zablocki
Patrick Pérez
Matthieu Cord
19
30
0
09 Dec 2020
FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene
  Understanding
FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding
Maryam Rahnemoonfar
Tashnim Chowdhury
Argho Sarkar
D. Varshney
M. Yari
Robin Murphy
9
237
0
05 Dec 2020
Multimodal Learning for Hateful Memes Detection
Multimodal Learning for Hateful Memes Detection
Yi Zhou
Zhenhao Chen
16
56
0
25 Nov 2020
XTQA: Span-Level Explanations of the Textbook Question Answering
XTQA: Span-Level Explanations of the Textbook Question Answering
Jie Ma
Q. Zheng
Jun Liu
Qingyu Yin
Jianlong Zhou
Y. Huang
14
12
0
25 Nov 2020
An Improved Attention for Visual Question Answering
An Improved Attention for Visual Question Answering
Tanzila Rahman
Shih-Han Chou
Leonid Sigal
Giuseppe Carenini
13
42
0
04 Nov 2020
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question
  Answering
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering
Zanxia Jin
Heran Wu
Chun Yang
Fang Zhou
Jingyan Qin
Lei Xiao
Xu-Cheng Yin
9
30
0
24 Oct 2020
Combination of Deep Speaker Embeddings for Diarisation
Combination of Deep Speaker Embeddings for Diarisation
Guangzhi Sun
Chao Zhang
P. Woodland
17
20
0
22 Oct 2020
Answer-checking in Context: A Multi-modal FullyAttention Network for
  Visual Question Answering
Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering
Hantao Huang
Tao Han
Wei Han
D. Yap
Cheng-Ming Chiang
13
2
0
17 Oct 2020
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
Wei-Neng Chen
Weiping Wang
Li Liu
M. Lew
VLM
110
31
0
16 Oct 2020
Attention Guided Semantic Relationship Parsing for Visual Question
  Answering
Attention Guided Semantic Relationship Parsing for Visual Question Answering
M. Farazi
Salman Khan
Nick Barnes
11
2
0
05 Oct 2020
Linguistic Structure Guided Context Modeling for Referring Image
  Segmentation
Linguistic Structure Guided Context Modeling for Referring Image Segmentation
Tianrui Hui
Si Liu
Shaofei Huang
Guanbin Li
Sansi Yu
Faxi Zhang
Jizhong Han
8
148
0
01 Oct 2020
Referring Image Segmentation via Cross-Modal Progressive Comprehension
Referring Image Segmentation via Cross-Modal Progressive Comprehension
Shaofei Huang
Tianrui Hui
Si Liu
Guanbin Li
Yunchao Wei
Jizhong Han
Luoqi Liu
Bo-wen Li
EgoV
21
176
0
01 Oct 2020
Cross-modal Knowledge Reasoning for Knowledge-based Visual Question
  Answering
Cross-modal Knowledge Reasoning for Knowledge-based Visual Question Answering
J. Yu
Zihao Zhu
Yujing Wang
Weifeng Zhang
Yue Hu
Jianlong Tan
6
98
0
31 Aug 2020
LowFER: Low-rank Bilinear Pooling for Link Prediction
LowFER: Low-rank Bilinear Pooling for Link Prediction
Saadullah Amin
Stalin Varanasi
K. Dunfield
G. Neumann
12
40
0
25 Aug 2020
Adaptive Context-Aware Multi-Modal Network for Depth Completion
Adaptive Context-Aware Multi-Modal Network for Depth Completion
Shanshan Zhao
Mingming Gong
Huan Fu
Dacheng Tao
13
152
0
25 Aug 2020
Joint Modeling of Chest Radiographs and Radiology Reports for Pulmonary
  Edema Assessment
Joint Modeling of Chest Radiographs and Radiology Reports for Pulmonary Edema Assessment
Geeticka Chauhan
Ruizhi Liao
W. Wells
Jacob Andreas
Xin Wang
Seth Berkowitz
Steven Horng
Peter Szolovits
Polina Golland
MedIm
9
52
0
22 Aug 2020
Linguistically-aware Attention for Reducing the Semantic-Gap in
  Vision-Language Tasks
Linguistically-aware Attention for Reducing the Semantic-Gap in Vision-Language Tasks
K. Gouthaman
Athira M. Nambiar
K. Srinivas
Anurag Mittal
VLM
19
12
0
18 Aug 2020
AiR: Attention with Reasoning Capability
AiR: Attention with Reasoning Capability
Shi Chen
Ming Jiang
Jinhui Yang
Qi Zhao
LRM
13
36
0
28 Jul 2020
REXUP: I REason, I EXtract, I UPdate with Structured Compositional
  Reasoning for Visual Question Answering
REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering
Siwen Luo
S. Han
Kaiyuan Sun
Josiah Poon
CoGe
LRM
ReLM
18
4
0
27 Jul 2020
Contrastive Visual-Linguistic Pretraining
Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Su
Zhengkai Jiang
Peng Gao
Zuohui Fu
Gerard de Melo
Sen Su
VLM
SSL
CLIP
25
29
0
26 Jul 2020
Previous
123456
Next