ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1704.05526
  4. Cited By
Learning to Reason: End-to-End Module Networks for Visual Question
  Answering

Learning to Reason: End-to-End Module Networks for Visual Question Answering

18 April 2017
Ronghang Hu
Jacob Andreas
Marcus Rohrbach
Trevor Darrell
Kate Saenko
    KELM
    GNN
    ReLM
    LRM
ArXivPDFHTML

Papers citing "Learning to Reason: End-to-End Module Networks for Visual Question Answering"

50 / 70 papers shown
Title
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Thanh-Son Nguyen
Hong Yang
Tzeh Yuan Neoh
Hao Zhang
Ee Yeo Keat
Basura Fernando
NAI
54
0
0
19 Mar 2025
Make Haste Slowly: A Theory of Emergent Structured Mixed Selectivity in Feature Learning ReLU Networks
Make Haste Slowly: A Theory of Emergent Structured Mixed Selectivity in Feature Learning ReLU Networks
Devon Jarvis
Richard Klein
Benjamin Rosman
Andrew M. Saxe
MLT
64
1
0
08 Mar 2025
Learning to Reason Iteratively and Parallelly for Complex Visual
  Reasoning Scenarios
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios
Shantanu Jaiswal
Debaditya Roy
Basura Fernando
Cheston Tan
ReLM
LRM
71
2
0
20 Nov 2024
Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs
Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs
A. Mavrogiannis
Dehao Yuan
Yiannis Aloimonos
LM&Ro
27
0
0
23 Sep 2024
What Makes a Maze Look Like a Maze?
What Makes a Maze Look Like a Maze?
Joy Hsu
Jiayuan Mao
J. Tenenbaum
Noah D. Goodman
Jiajun Wu
OCL
52
6
0
12 Sep 2024
Breaking Neural Network Scaling Laws with Modularity
Breaking Neural Network Scaling Laws with Modularity
Akhilan Boopathy
Sunshine Jiang
William Yue
Jaedong Hwang
Abhiram Iyer
Ila Fiete
OOD
34
2
0
09 Sep 2024
3VL: Using Trees to Improve Vision-Language Models' Interpretability
3VL: Using Trees to Improve Vision-Language Models' Interpretability
Nir Yellinek
Leonid Karlinsky
Raja Giryes
CoGe
VLM
49
4
0
28 Dec 2023
ProtoArgNet: Interpretable Image Classification with Super-Prototypes
  and Argumentation [Technical Report]
ProtoArgNet: Interpretable Image Classification with Super-Prototypes and Argumentation [Technical Report]
Hamed Ayoobi
Nico Potyka
Francesca Toni
30
2
0
26 Nov 2023
Modularized Zero-shot VQA with Pre-trained Models
Modularized Zero-shot VQA with Pre-trained Models
Rui Cao
Jing Jiang
LRM
19
2
0
27 May 2023
Curriculum Learning for Compositional Visual Reasoning
Curriculum Learning for Compositional Visual Reasoning
Wafa Aissa
Marin Ferecatu
M. Crucianu
LRM
24
3
0
27 Mar 2023
ViperGPT: Visual Inference via Python Execution for Reasoning
ViperGPT: Visual Inference via Python Execution for Reasoning
Dídac Surís
Sachit Menon
Carl Vondrick
MLLM
LRM
ReLM
45
429
0
14 Mar 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
89
11
0
03 Mar 2023
Decomposing a Recurrent Neural Network into Modules for Enabling
  Reusability and Replacement
Decomposing a Recurrent Neural Network into Modules for Enabling Reusability and Replacement
S. Imtiaz
Fraol Batole
Astha Singh
Rangeet Pan
Breno Dantas Cruz
Hridesh Rajan
11
7
0
09 Dec 2022
A Short Survey of Systematic Generalization
A Short Survey of Systematic Generalization
Yuanpeng Li
AI4CE
22
1
0
22 Nov 2022
Visual Programming: Compositional visual reasoning without training
Visual Programming: Compositional visual reasoning without training
Tanmay Gupta
Aniruddha Kembhavi
ReLM
VLM
LRM
43
399
0
18 Nov 2022
Neural Attentive Circuits
Neural Attentive Circuits
Nasim Rahaman
M. Weiß
Francesco Locatello
C. Pal
Yoshua Bengio
Bernhard Schölkopf
Erran L. Li
Nicolas Ballas
19
6
0
14 Oct 2022
On the Explainability of Natural Language Processing Deep Models
On the Explainability of Natural Language Processing Deep Models
Julia El Zini
M. Awad
25
82
0
13 Oct 2022
Binding Language Models in Symbolic Languages
Binding Language Models in Symbolic Languages
Zhoujun Cheng
Tianbao Xie
Peng Shi
Chengzu Li
Rahul Nadkarni
...
Dragomir R. Radev
Mari Ostendorf
Luke Zettlemoyer
Noah A. Smith
Tao Yu
LMTD
109
197
0
06 Oct 2022
Learning to Collocate Visual-Linguistic Neural Modules for Image
  Captioning
Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Chongyang Gao
Jianfei Cai
MLLM
25
10
0
04 Oct 2022
Visual Perturbation-aware Collaborative Learning for Overcoming the
  Language Prior Problem
Visual Perturbation-aware Collaborative Learning for Overcoming the Language Prior Problem
Yudong Han
Liqiang Nie
Jianhua Yin
Jianlong Wu
Yan Yan
24
12
0
24 Jul 2022
What is Right for Me is Not Yet Right for You: A Dataset for Grounding
  Relative Directions via Multi-Task Learning
What is Right for Me is Not Yet Right for You: A Dataset for Grounding Relative Directions via Multi-Task Learning
Jae Hee Lee
Matthias Kerzel
Kyra Ahrens
C. Weber
S. Wermter
27
8
0
05 May 2022
METGEN: A Module-Based Entailment Tree Generation Framework for Answer
  Explanation
METGEN: A Module-Based Entailment Tree Generation Framework for Answer Explanation
Ruixin Hong
Hongming Zhang
Xintong Yu
Changshui Zhang
ReLM
LRM
27
32
0
05 May 2022
NEWSKVQA: Knowledge-Aware News Video Question Answering
NEWSKVQA: Knowledge-Aware News Video Question Answering
Pranay Gupta
Manish Gupta
22
7
0
08 Feb 2022
Adventurer's Treasure Hunt: A Transparent System for Visually Grounded
  Compositional Visual Question Answering based on Scene Graphs
Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs
Daniel Reich
F. Putze
Tanja Schultz
22
2
0
28 Jun 2021
A Review on Explainability in Multimodal Deep Neural Nets
A Review on Explainability in Multimodal Deep Neural Nets
Gargi Joshi
Rahee Walambe
K. Kotecha
16
137
0
17 May 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
46
858
0
26 Apr 2021
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in
  Visual Question Answering
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
Corentin Dancette
Rémi Cadène
Damien Teney
Matthieu Cord
CML
26
74
0
07 Apr 2021
Explainability of deep vision-based autonomous driving systems: Review
  and challenges
Explainability of deep vision-based autonomous driving systems: Review and challenges
Éloi Zablocki
H. Ben-younes
P. Pérez
Matthieu Cord
XAI
32
169
0
13 Jan 2021
Deep Reinforcement Learning with Stacked Hierarchical Attention for
  Text-based Games
Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games
Yunqiu Xu
Meng Fang
Ling-Hao Chen
Yali Du
Joey Tianyi Zhou
Chengqi Zhang
OffRL
15
44
0
22 Oct 2020
Object-and-Action Aware Model for Visual Language Navigation
Object-and-Action Aware Model for Visual Language Navigation
Yuankai Qi
Zizheng Pan
Shengping Zhang
A. Hengel
Qi Wu
LM&Ro
18
111
0
29 Jul 2020
AiR: Attention with Reasoning Capability
AiR: Attention with Reasoning Capability
Shi Chen
Ming Jiang
Jinhui Yang
Qi Zhao
LRM
11
36
0
28 Jul 2020
Referring Expression Comprehension: A Survey of Methods and Datasets
Referring Expression Comprehension: A Survey of Methods and Datasets
Yanyuan Qiao
Chaorui Deng
Qi Wu
ObjD
42
93
0
19 Jul 2020
Learning to Discretely Compose Reasoning Module Networks for Video
  Captioning
Learning to Discretely Compose Reasoning Module Networks for Video Captioning
Ganchao Tan
Daqing Liu
Meng Wang
Zhengjun Zha
LRM
21
73
0
17 Jul 2020
Large-Scale Adversarial Training for Vision-and-Language Representation
  Learning
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
24
487
0
11 Jun 2020
Dynamic Language Binding in Relational Visual Reasoning
Dynamic Language Binding in Relational Visual Reasoning
T. Le
Vuong Le
Svetha Venkatesh
T. Tran
NAI
18
19
0
30 Apr 2020
A Review on Intelligent Object Perception Methods Combining
  Knowledge-based Reasoning and Machine Learning
A Review on Intelligent Object Perception Methods Combining Knowledge-based Reasoning and Machine Learning
Filippos Gouidis
Alexandros Vassiliades
T. Patkos
Antonis Argyros
Nick Bassiliades
Dimitris Plexousakis
OCL
19
12
0
26 Dec 2019
Towards Causal VQA: Revealing and Reducing Spurious Correlations by
  Invariant and Covariant Semantic Editing
Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing
Vedika Agarwal
Rakshith Shetty
Mario Fritz
CML
AAML
19
155
0
16 Dec 2019
Multi-modal Deep Analysis for Multimedia
Multi-modal Deep Analysis for Multimedia
Wenwu Zhu
Xin Eric Wang
Hongzhi Li
19
38
0
11 Oct 2019
Synthetic Data for Deep Learning
Synthetic Data for Deep Learning
Sergey I. Nikolenko
36
347
0
25 Sep 2019
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Mohit Bansal
VLM
MLLM
52
2,447
0
20 Aug 2019
A Multi-Type Multi-Span Network for Reading Comprehension that Requires
  Discrete Reasoning
A Multi-Type Multi-Span Network for Reading Comprehension that Requires Discrete Reasoning
Minghao Hu
Yuxing Peng
Zhen Huang
Dongsheng Li
AIMat
LRM
19
90
0
15 Aug 2019
Variational Context: Exploiting Visual and Textual Context for Grounding
  Referring Expressions
Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions
Yulei Niu
Hanwang Zhang
Zhiwu Lu
Shih-Fu Chang
ObjD
BDL
26
24
0
08 Jul 2019
Compositional generalization in a deep seq2seq model by separating
  syntax and semantics
Compositional generalization in a deep seq2seq model by separating syntax and semantics
Jacob Russin
Jason Jo
R. C. O'Reilly
Yoshua Bengio
19
102
0
22 Apr 2019
Learning to Collocate Neural Modules for Image Captioning
Learning to Collocate Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Jianfei Cai
11
77
0
18 Apr 2019
Factor Graph Attention
Factor Graph Attention
Idan Schwartz
Seunghak Yu
Tamir Hazan
A. Schwing
19
110
0
11 Apr 2019
CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual
  Dialog
CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog
Satwik Kottur
José M. F. Moura
Devi Parikh
Dhruv Batra
Marcus Rohrbach
24
86
0
07 Mar 2019
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing
Chi Zhang
Feng Gao
Baoxiong Jia
Yixin Zhu
Song-Chun Zhu
AIMat
14
303
0
07 Mar 2019
Answer Them All! Toward Universal Visual Question Answering Models
Answer Them All! Toward Universal Visual Question Answering Models
Robik Shrestha
Kushal Kafle
Christopher Kanan
17
82
0
01 Mar 2019
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
Ning Xie
Farley Lai
Derek Doran
Asim Kadav
CoGe
31
321
0
20 Jan 2019
Neighbourhood Watch: Referring Expression Comprehension via
  Language-guided Graph Attention Networks
Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks
Peng Wang
Qi Wu
Jiewei Cao
Chunhua Shen
Lianli Gao
A. Hengel
ObjD
22
252
0
12 Dec 2018
12
Next