Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1511.03416
Cited By
Visual7W: Grounded Question Answering in Images
11 November 2015
Yuke Zhu
Oliver Groth
Michael S. Bernstein
Li Fei-Fei
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual7W: Grounded Question Answering in Images"
50 / 122 papers shown
Title
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning
Pan Lu
Liang Qiu
Jiaqi Chen
Tony Xia
Yizhou Zhao
Wei Zhang
Zhou Yu
Xiaodan Liang
Song-Chun Zhu
AIMat
28
183
0
25 Oct 2021
Awakening Latent Grounding from Pretrained Language Models for Semantic Parsing
Qian Liu
Dejian Yang
Jiahui Zhang
Jiaqi Guo
Bin Zhou
Jian-Guang Lou
43
41
0
22 Sep 2021
Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
Jihyung Kil
Cheng Zhang
D. Xuan
Wei-Lun Chao
56
20
0
13 Sep 2021
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Ming Yan
Haiyang Xu
Chenliang Li
Bin Bi
Junfeng Tian
Min Gui
Wei Wang
VLM
25
10
0
21 Aug 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Mohit Bansal
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
188
405
0
13 Jul 2021
Explanation-Based Human Debugging of NLP Models: A Survey
Piyawat Lertvittayakumjorn
Francesca Toni
LRM
28
79
0
30 Apr 2021
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
Zhenfang Chen
Jiayuan Mao
Jiajun Wu
Kwan-Yee Kenneth Wong
J. Tenenbaum
Chuang Gan
VGen
31
92
0
30 Mar 2021
Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA
Yonatan Bitton
Gabriel Stanovsky
Roy Schwartz
Michael Elhadad
CoGe
17
33
0
17 Mar 2021
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
249
525
0
04 Feb 2021
Answer Questions with Right Image Regions: A Visual Attention Regularization Approach
Y. Liu
Yangyang Guo
Jianhua Yin
Xuemeng Song
Weifeng Liu
Liqiang Nie
24
28
0
03 Feb 2021
Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs
Antoni Rosinol
Andrew Violette
Marcus Abate
Nathan Hughes
Yun Chang
J. Shi
Arjun Gupta
Luca Carlone
3DV
23
220
0
18 Jan 2021
ORDNet: Capturing Omni-Range Dependencies for Scene Parsing
Shaofei Huang
Si Liu
Tianrui Hui
Jizhong Han
Bo-wen Li
Jiashi Feng
Shuicheng Yan
3DPC
OffRL
19
15
0
11 Jan 2021
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs
Emanuele Bugliarello
Ryan Cotterell
Naoaki Okazaki
Desmond Elliott
22
119
0
30 Nov 2020
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering
Aisha Urooj Khan
Amir Mazaheri
N. Lobo
M. Shah
24
56
0
27 Oct 2020
Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering
Zihao Zhu
J. Yu
Yujing Wang
Yajing Sun
Yue Hu
Qi Wu
17
125
0
16 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai
Justin Johnson
SSL
VLM
19
432
0
11 Jun 2020
Probing Contextual Language Models for Common Ground with Visual Representations
Gabriel Ilharco
Rowan Zellers
Ali Farhadi
Hannaneh Hajishirzi
22
14
0
01 May 2020
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
M. Farazi
Salman H. Khan
Nick Barnes
21
17
0
20 Jan 2020
A Review on Intelligent Object Perception Methods Combining Knowledge-based Reasoning and Machine Learning
Filippos Gouidis
Alexandros Vassiliades
T. Patkos
Antonis Argyros
Nick Bassiliades
Dimitris Plexousakis
OCL
29
12
0
26 Dec 2019
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
A. Schwing
LRM
ReLM
26
9
0
31 Oct 2019
3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera
Iro Armeni
Zhi-Yang He
JunYoung Gwak
Amir Zamir
Martin Fischer
Jitendra Malik
Silvio Savarese
3DV
3DPC
28
336
0
06 Oct 2019
Compact Trilinear Interaction for Visual Question Answering
Tuong Khanh Long Do
Thanh-Toan Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
28
59
0
26 Sep 2019
Scene Graph Parsing by Attention Graph
Martin Andrews
Yew Ken Chia
Sam Witteveen
GNN
14
11
0
13 Sep 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
29
1,643
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Mohit Bansal
VLM
MLLM
55
2,447
0
20 Aug 2019
A Fast and Accurate One-Stage Approach to Visual Grounding
Zhengyuan Yang
Boqing Gong
Liwei Wang
Wenbing Huang
Dong Yu
Jiebo Luo
ObjD
12
360
0
18 Aug 2019
U-CAM: Visual Explanation using Uncertainty based Class Activation Maps
Badri N. Patro
Mayank Lunayach
Shivansh Patel
Vinay P. Namboodiri
FAtt
UQCV
19
76
0
17 Aug 2019
An Empirical Study on Leveraging Scene Graphs for Visual Question Answering
Cheng Zhang
Wei-Lun Chao
D. Xuan
23
50
0
28 Jul 2019
Adversarial Multimodal Network for Movie Question Answering
Zhaoquan Yuan
Siyuan Sun
Lixin Duan
Xiao Wu
Changsheng Xu
19
3
0
24 Jun 2019
TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei
Licheng Yu
Tamara L. Berg
Mohit Bansal
23
227
0
25 Apr 2019
Factor Graph Attention
Idan Schwartz
Seunghak Yu
Tamir Hazan
A. Schwing
19
110
0
11 Apr 2019
A Simple Baseline for Audio-Visual Scene-Aware Dialog
Idan Schwartz
A. Schwing
Tamir Hazan
19
69
0
11 Apr 2019
Reasoning Visual Dialogs with Structural and Partial Observations
Zilong Zheng
Wenguan Wang
Siyuan Qi
Song-Chun Zhu
28
117
0
11 Apr 2019
Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches
Shane Storks
Qiaozi Gao
J. Chai
13
128
0
02 Apr 2019
Constructing Hierarchical Q&A Datasets for Video Story Understanding
Y. Heo
Kyoung-Woon On
Seong-Ho Choi
Jaeseo Lim
Jinah Kim
Jeh-Kwang Ryu
Byung-Chull Bae
Byoung-Tak Zhang
17
4
0
01 Apr 2019
Periphery-Fovea Multi-Resolution Driving Model guided by Human Attention
Ye Xia
Jinkyu Kim
John F. Canny
K. Zipser
D. Whitney
16
51
0
24 Mar 2019
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing
Chi Zhang
Feng Gao
Baoxiong Jia
Yixin Zhu
Song-Chun Zhu
AIMat
14
303
0
07 Mar 2019
CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery
Gongjie Zhang
Shijian Lu
Wei Zhang
20
353
0
03 Mar 2019
Audio-Visual Scene-Aware Dialog
Huda AlAmri
Vincent Cartillier
Abhishek Das
Jue Wang
A. Cherian
...
Tim K. Marks
Chiori Hori
Peter Anderson
Stefan Lee
Devi Parikh
VGen
17
188
0
25 Jan 2019
Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey
W. Zhang
Quan Z. Sheng
A. Alhazmi
Chenliang Li
AAML
16
57
0
21 Jan 2019
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
Ning Xie
Farley Lai
Derek Doran
Asim Kadav
CoGe
31
321
0
20 Jan 2019
Using Deep Learning for price prediction by exploiting stationary limit order book features
Avraam Tsantekidis
Nikolaos Passalis
Anastasios Tefas
J. Kanniainen
M. Gabbouj
Alexandros Iosifidis
OOD
13
88
0
23 Oct 2018
Overcoming Language Priors in Visual Question Answering with Adversarial Regularization
S. Ramakrishnan
Aishwarya Agrawal
Stefan Lee
AAML
20
235
0
08 Oct 2018
Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition
Jianwei Yang
Jiasen Lu
Stefan Lee
Dhruv Batra
Devi Parikh
6
42
0
01 Oct 2018
Interpretable Visual Question Answering by Reasoning on Dependency Trees
Qingxing Cao
Bailin Li
Xiaodan Liang
Liang Lin
20
55
0
06 Sep 2018
Defoiling Foiled Image Captions
Pranava Madhyastha
Josiah Wang
Lucia Specia
22
9
0
16 May 2018
Did the Model Understand the Question?
Pramod Kaushik Mudrakarta
Ankur Taly
Mukund Sundararajan
Kedar Dhamdhere
ELM
OOD
FAtt
27
196
0
14 May 2018
Unsupervised Textual Grounding: Linking Words to Image Concepts
Raymond A. Yeh
Minh Do
A. Schwing
16
40
0
29 Mar 2018
Motion-Appearance Co-Memory Networks for Video Question Answering
J. Gao
Runzhou Ge
Kan Chen
Ram Nevatia
18
240
0
29 Mar 2018
Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning
David Mascharka
Philip Tran
Ryan Soklaski
Arjun Majumdar
31
207
0
14 Mar 2018
Previous
1
2
3
Next