Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.06676
Cited By
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
18 May 2017
H. Ben-younes
Rémi Cadène
Matthieu Cord
Nicolas Thome
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MUTAN: Multimodal Tucker Fusion for Visual Question Answering"
50 / 272 papers shown
Title
Hadamard product in deep learning: Introduction, Advances and Challenges
Grigorios G. Chrysos
Yongtao Wu
Razvan Pascanu
Philip Torr
V. Cevher
AAML
96
0
0
17 Apr 2025
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
77
8
0
21 Feb 2025
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
Anupam Pandey
Deepjyoti Bodo
Arpan Phukan
Asif Ekbal
36
0
0
13 Jan 2025
Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering
Zhongjian Hu
Peng Yang
Bing Li
Zhenqi Wang
39
0
0
24 Dec 2024
Prompting Large Language Models with Rationale Heuristics for Knowledge-based Visual Question Answering
Zhongjian Hu
Peng Yang
Bing Li
Fengyuan Liu
LRM
117
58
0
22 Dec 2024
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
Raihan Kabir
Naznin Haque
Md. Saiful Islam
Marium-E. Jannat
CoGe
29
1
0
17 Nov 2024
Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry
Wenjun Hou
Yi Cheng
Kaishuai Xu
Yan Hu
Wenjie Li
Jiang-Dong Liu
28
0
0
17 Nov 2024
Classifier-guided Gradient Modulation for Enhanced Multimodal Learning
Zirun Guo
Tao Jin
Jingyuan Chen
Zhou Zhao
32
6
0
03 Nov 2024
SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset
Ngoc Dung Huynh
Mohamed Reda Bouadjenek
Sunil Aryal
Imran Razzak
Hakim Hacid
26
0
0
30 Oct 2024
GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning
Haiwen Diao
Ying Zhang
Shang Gao
Jiawen Zhu
Long Chen
Huchuan Lu
29
4
0
20 Oct 2024
ECIS-VQG: Generation of Entity-centric Information-seeking Questions from Videos
Arpan Phukan
Manish Gupta
Asif Ekbal
VGen
42
0
0
13 Oct 2024
IIU: Independent Inference Units for Knowledge-based Visual Question Answering
Yili Li
Jing Yu
Keke Gai
Gang Xiong
21
0
0
15 Aug 2024
Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery
Long Bai
Guankun Wang
Mobarakol Islam
Lalithkumar Seenivasan
An-Chi Wang
Hongliang Ren
38
13
0
09 Aug 2024
Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models
Wenbin An
Feng Tian
Jiahao Nie
Wenkai Shi
Haonan Lin
Yan Chen
Qianying Wang
Y. Wu
Guang Dai
Ping Chen
VLM
45
4
0
22 Jul 2024
Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning
Wenrui Li
Penghong Wang
Ruiqin Xiong
Xiaopeng Fan
32
8
0
11 Jul 2024
Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA
Elham J. Barezi
Parisa Kordjamshidi
CoGe
22
0
0
27 Jun 2024
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
Ding Jia
Jianyuan Guo
Kai Han
Han Wu
Chao Zhang
Chang Xu
Xinghao Chen
ViT
40
15
0
03 Jun 2024
PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery
Runlong He
Mengya Xu
Adrito Das
Danyal Z. Khan
Sophia Bano
Hani J. Marcus
Danail Stoyanov
Matthew J. Clarkson
Mobarakol Islam
45
6
0
22 May 2024
Find The Gap: Knowledge Base Reasoning For Visual Question Answering
Elham J. Barezi
Parisa Kordjamshidi
24
0
0
16 Apr 2024
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
Övgü Özdemir
Erdem Akagündüz
36
10
0
12 Apr 2024
Generative Multi-modal Models are Good Class-Incremental Learners
Xusheng Cao
Haori Lu
Linlan Huang
Xialei Liu
Ming-Ming Cheng
CLL
41
10
0
27 Mar 2024
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
Guan-Feng Wang
Long Bai
Wan Jun Nah
Jie Wang
Zhaoxi Zhang
Zhen Chen
Jinlin Wu
Mobarakol Islam
Hongbin Liu
Hongliang Ren
40
14
0
22 Mar 2024
Modality-Aware Integration with Large Language Models for Knowledge-based Visual Question Answering
Junnan Dong
Qinggang Zhang
Huachi Zhou
Daochen Zha
Pai Zheng
Xiao Huang
40
8
0
20 Feb 2024
Convincing Rationales for Visual Question Answering Reasoning
Kun Li
G. Vosselman
Michael Ying Yang
34
1
0
06 Feb 2024
MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning
Zhe Li
Laurence T. Yang
Bocheng Ren
Xin Nie
Zhangyang Gao
Cheng Tan
Stan Z. Li
VLM
13
11
0
03 Feb 2024
Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis
Jianing Li
Xi Nan
Ming Lu
Li Du
Shanghang Zhang
42
1
0
31 Jan 2024
Collaborative Position Reasoning Network for Referring Image Segmentation
Jianjian Cao
Beiya Dai
Yulin Li
Xiameng Qin
Jingdong Wang
23
0
0
22 Jan 2024
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Rizhao Cai
Zirui Song
Dayan Guan
Zhenhao Chen
Xing Luo
Chenyu Yi
Alex C. Kot
MLLM
VLM
25
31
0
05 Dec 2023
The curse of language biases in remote sensing VQA: the role of spatial attributes, language diversity, and the need for clear evaluation
Christel Chappuis
Eliot Walt
Vincent Mendez
Sylvain Lobry
B. L. Saux
D. Tuia
23
3
0
28 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
35
36
0
01 Nov 2023
SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering
Bruno Souza
Marius Aasan
Hélio Pedrini
Adín Ramirez Rivera
SSL
14
2
0
03 Oct 2023
Syntax Tree Constrained Graph Network for Visual Question Answering
Xiangrui Su
Qi Zhang
Chongyang Shi
Jiachang Liu
Liang Hu
GNN
NAI
29
3
0
17 Sep 2023
Enhancing Deep Learning Models through Tensorization: A Comprehensive Survey and Framework
Manal Helal
17
0
0
05 Sep 2023
Distraction-free Embeddings for Robust VQA
Atharvan Dogra
Deeksha Varshney
A. Kalyan
A. Deshpande
Neeraj Kumar
14
0
0
31 Aug 2023
Mobile Foundation Model as Firmware
Jinliang Yuan
Chenchen Yang
Dongqi Cai
Shihe Wang
Xin Yuan
...
Di Zhang
Hanzi Mei
Xianqing Jia
Shangguang Wang
Mengwei Xu
32
19
0
28 Aug 2023
Simple Baselines for Interactive Video Retrieval with Questions and Answers
Kaiqu Liang
Samuel Albanie
22
2
0
21 Aug 2023
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
Jie Ma
Pinghui Wang
Dechen Kong
Zewei Wang
Jun Liu
Hongbin Pei
Junzhou Zhao
OOD
24
18
0
21 Jul 2023
CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery
Long Bai
Mobarakol Islam
Hongliang Ren
17
20
0
11 Jul 2023
Localized Questions in Medical Visual Question Answering
Sergio Tascon-Morales
Pablo Márquez-Neila
Raphael Sznitman
17
8
0
03 Jul 2023
Deep Equilibrium Multimodal Fusion
Jinhong Ni
Yalong Bai
Wei Zhang
Ting Yao
Tao Mei
24
1
0
29 Jun 2023
Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New Benchmark
Li Xu
Bo Liu
Ameer Hamza Khan
Lu Fan
Xiao-Ming Wu
LM&MA
25
9
0
10 Jun 2023
Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge
Xingyu Fu
Shenmin Zhang
Gukyeong Kwon
Pramuditha Perera
Henghui Zhu
...
Zhiguo Wang
Vittorio Castelli
Patrick K. L. Ng
Dan Roth
Bing Xiang
27
19
0
30 May 2023
Combo of Thinking and Observing for Outside-Knowledge VQA
Q. Si
Yuchen Mo
Zheng Lin
Huishan Ji
Weiping Wang
35
13
0
10 May 2023
SurgicalGPT: End-to-End Language-Vision GPT for Visual Question Answering in Surgery
Lalithkumar Seenivasan
Mobarakol Islam
Gokul Kannan
Hongliang Ren
11
39
0
19 Apr 2023
Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions
Jia-Hong Huang
Modar Alfadly
Bernard Ghanem
M. Worring
OOD
AAML
33
5
0
06 Apr 2023
Logical Implications for Visual Question Answering Consistency
Sergio Tascon-Morales
Pablo Márquez-Neila
Raphael Sznitman
13
9
0
16 Mar 2023
Implicit and Explicit Commonsense for Multi-sentence Video Captioning
Shih-Han Chou
James J. Little
Leonid Sigal
21
2
0
14 Mar 2023
Tucker Bilinear Attention Network for Multi-scale Remote Sensing Object Detection
Tao Chen
Ruirui Li
Jiafeng Fu
Daguang Jiang
17
1
0
09 Mar 2023
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention
Shijie Geng
Jianbo Yuan
Yu Tian
Yuxiao Chen
Yongfeng Zhang
CLIP
VLM
41
44
0
06 Mar 2023
VQA with Cascade of Self- and Co-Attention Blocks
Aakansha Mishra
Ashish Anand
Prithwijit Guha
33
0
0
28 Feb 2023
1
2
3
4
5
6
Next