Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1704.05526
Cited By
Learning to Reason: End-to-End Module Networks for Visual Question Answering
18 April 2017
Ronghang Hu
Jacob Andreas
Marcus Rohrbach
Trevor Darrell
Kate Saenko
KELM
GNN
ReLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning to Reason: End-to-End Module Networks for Visual Question Answering"
50 / 73 papers shown
Title
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Thanh-Son Nguyen
Hong Yang
Tzeh Yuan Neoh
Hao Zhang
Ee Yeo Keat
Basura Fernando
NAI
54
0
0
19 Mar 2025
Make Haste Slowly: A Theory of Emergent Structured Mixed Selectivity in Feature Learning ReLU Networks
Devon Jarvis
Richard Klein
Benjamin Rosman
Andrew M. Saxe
MLT
64
1
0
08 Mar 2025
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios
Shantanu Jaiswal
Debaditya Roy
Basura Fernando
Cheston Tan
ReLM
LRM
71
2
0
20 Nov 2024
Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs
A. Mavrogiannis
Dehao Yuan
Yiannis Aloimonos
LM&Ro
27
0
0
23 Sep 2024
What Makes a Maze Look Like a Maze?
Joy Hsu
Jiayuan Mao
J. Tenenbaum
Noah D. Goodman
Jiajun Wu
OCL
52
6
0
12 Sep 2024
Breaking Neural Network Scaling Laws with Modularity
Akhilan Boopathy
Sunshine Jiang
William Yue
Jaedong Hwang
Abhiram Iyer
Ila Fiete
OOD
34
2
0
09 Sep 2024
3VL: Using Trees to Improve Vision-Language Models' Interpretability
Nir Yellinek
Leonid Karlinsky
Raja Giryes
CoGe
VLM
49
4
0
28 Dec 2023
ProtoArgNet: Interpretable Image Classification with Super-Prototypes and Argumentation [Technical Report]
Hamed Ayoobi
Nico Potyka
Francesca Toni
33
2
0
26 Nov 2023
Modularized Zero-shot VQA with Pre-trained Models
Rui Cao
Jing Jiang
LRM
19
2
0
27 May 2023
Curriculum Learning for Compositional Visual Reasoning
Wafa Aissa
Marin Ferecatu
M. Crucianu
LRM
24
3
0
27 Mar 2023
ViperGPT: Visual Inference via Python Execution for Reasoning
Dídac Surís
Sachit Menon
Carl Vondrick
MLLM
LRM
ReLM
45
429
0
14 Mar 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
89
11
0
03 Mar 2023
Decomposing a Recurrent Neural Network into Modules for Enabling Reusability and Replacement
S. Imtiaz
Fraol Batole
Astha Singh
Rangeet Pan
Breno Dantas Cruz
Hridesh Rajan
11
7
0
09 Dec 2022
A Short Survey of Systematic Generalization
Yuanpeng Li
AI4CE
22
1
0
22 Nov 2022
Visual Programming: Compositional visual reasoning without training
Tanmay Gupta
Aniruddha Kembhavi
ReLM
VLM
LRM
43
399
0
18 Nov 2022
Neural Attentive Circuits
Nasim Rahaman
M. Weiß
Francesco Locatello
C. Pal
Yoshua Bengio
Bernhard Schölkopf
Erran L. Li
Nicolas Ballas
19
6
0
14 Oct 2022
On the Explainability of Natural Language Processing Deep Models
Julia El Zini
M. Awad
25
82
0
13 Oct 2022
Binding Language Models in Symbolic Languages
Zhoujun Cheng
Tianbao Xie
Peng Shi
Chengzu Li
Rahul Nadkarni
...
Dragomir R. Radev
Mari Ostendorf
Luke Zettlemoyer
Noah A. Smith
Tao Yu
LMTD
109
197
0
06 Oct 2022
Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Chongyang Gao
Jianfei Cai
MLLM
25
10
0
04 Oct 2022
Visual Perturbation-aware Collaborative Learning for Overcoming the Language Prior Problem
Yudong Han
Liqiang Nie
Jianhua Yin
Jianlong Wu
Yan Yan
24
12
0
24 Jul 2022
What is Right for Me is Not Yet Right for You: A Dataset for Grounding Relative Directions via Multi-Task Learning
Jae Hee Lee
Matthias Kerzel
Kyra Ahrens
C. Weber
S. Wermter
27
8
0
05 May 2022
METGEN: A Module-Based Entailment Tree Generation Framework for Answer Explanation
Ruixin Hong
Hongming Zhang
Xintong Yu
Changshui Zhang
ReLM
LRM
30
32
0
05 May 2022
NEWSKVQA: Knowledge-Aware News Video Question Answering
Pranay Gupta
Manish Gupta
22
7
0
08 Feb 2022
Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs
Daniel Reich
F. Putze
Tanja Schultz
22
2
0
28 Jun 2021
A Review on Explainability in Multimodal Deep Neural Nets
Gargi Joshi
Rahee Walambe
K. Kotecha
16
137
0
17 May 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
49
858
0
26 Apr 2021
Object-Centric Representation Learning for Video Question Answering
Long Hoang Dang
T. Le
Vuong Le
T. Tran
19
7
0
12 Apr 2021
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
Corentin Dancette
Rémi Cadène
Damien Teney
Matthieu Cord
CML
26
74
0
07 Apr 2021
Explainability of deep vision-based autonomous driving systems: Review and challenges
Éloi Zablocki
H. Ben-younes
P. Pérez
Matthieu Cord
XAI
32
169
0
13 Jan 2021
Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games
Yunqiu Xu
Meng Fang
Ling-Hao Chen
Yali Du
Joey Tianyi Zhou
Chengqi Zhang
OffRL
17
44
0
22 Oct 2020
Object-and-Action Aware Model for Visual Language Navigation
Yuankai Qi
Zizheng Pan
Shengping Zhang
A. Hengel
Qi Wu
LM&Ro
18
111
0
29 Jul 2020
AiR: Attention with Reasoning Capability
Shi Chen
Ming Jiang
Jinhui Yang
Qi Zhao
LRM
13
36
0
28 Jul 2020
Referring Expression Comprehension: A Survey of Methods and Datasets
Yanyuan Qiao
Chaorui Deng
Qi Wu
ObjD
42
93
0
19 Jul 2020
Learning to Discretely Compose Reasoning Module Networks for Video Captioning
Ganchao Tan
Daqing Liu
Meng Wang
Zhengjun Zha
LRM
21
73
0
17 Jul 2020
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
24
487
0
11 Jun 2020
Dynamic Language Binding in Relational Visual Reasoning
T. Le
Vuong Le
Svetha Venkatesh
T. Tran
NAI
18
19
0
30 Apr 2020
A Review on Intelligent Object Perception Methods Combining Knowledge-based Reasoning and Machine Learning
Filippos Gouidis
Alexandros Vassiliades
T. Patkos
Antonis Argyros
Nick Bassiliades
Dimitris Plexousakis
OCL
21
12
0
26 Dec 2019
Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing
Vedika Agarwal
Rakshith Shetty
Mario Fritz
CML
AAML
19
155
0
16 Dec 2019
Multi-modal Deep Analysis for Multimedia
Wenwu Zhu
Xin Eric Wang
Hongzhi Li
19
38
0
11 Oct 2019
Synthetic Data for Deep Learning
Sergey I. Nikolenko
36
347
0
25 Sep 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Mohit Bansal
VLM
MLLM
55
2,447
0
20 Aug 2019
A Multi-Type Multi-Span Network for Reading Comprehension that Requires Discrete Reasoning
Minghao Hu
Yuxing Peng
Zhen Huang
Dongsheng Li
AIMat
LRM
19
90
0
15 Aug 2019
Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions
Yulei Niu
Hanwang Zhang
Zhiwu Lu
Shih-Fu Chang
ObjD
BDL
26
24
0
08 Jul 2019
Compositional generalization in a deep seq2seq model by separating syntax and semantics
Jacob Russin
Jason Jo
R. C. O'Reilly
Yoshua Bengio
19
102
0
22 Apr 2019
Learning to Collocate Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Jianfei Cai
11
77
0
18 Apr 2019
Factor Graph Attention
Idan Schwartz
Seunghak Yu
Tamir Hazan
A. Schwing
19
110
0
11 Apr 2019
CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog
Satwik Kottur
José M. F. Moura
Devi Parikh
Dhruv Batra
Marcus Rohrbach
24
86
0
07 Mar 2019
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing
Chi Zhang
Feng Gao
Baoxiong Jia
Yixin Zhu
Song-Chun Zhu
AIMat
14
303
0
07 Mar 2019
Answer Them All! Toward Universal Visual Question Answering Models
Robik Shrestha
Kushal Kafle
Christopher Kanan
17
82
0
01 Mar 2019
MUREL: Multimodal Relational Reasoning for Visual Question Answering
Rémi Cadène
H. Ben-younes
Matthieu Cord
Nicolas Thome
LRM
16
271
0
25 Feb 2019
1
2
Next