ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.03416
  4. Cited By
Visual7W: Grounded Question Answering in Images

Visual7W: Grounded Question Answering in Images

11 November 2015
Yuke Zhu
Oliver Groth
Michael S. Bernstein
Li Fei-Fei
ArXivPDFHTML

Papers citing "Visual7W: Grounded Question Answering in Images"

50 / 122 papers shown
Title
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual
  Language Reasoning
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning
Pan Lu
Liang Qiu
Jiaqi Chen
Tony Xia
Yizhou Zhao
Wei Zhang
Zhou Yu
Xiaodan Liang
Song-Chun Zhu
AIMat
28
183
0
25 Oct 2021
Awakening Latent Grounding from Pretrained Language Models for Semantic
  Parsing
Awakening Latent Grounding from Pretrained Language Models for Semantic Parsing
Qian Liu
Dejian Yang
Jiahui Zhang
Jiaqi Guo
Bin Zhou
Jian-Guang Lou
43
41
0
22 Sep 2021
Discovering the Unknown Knowns: Turning Implicit Knowledge in the
  Dataset into Explicit Training Examples for Visual Question Answering
Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
Jihyung Kil
Cheng Zhang
D. Xuan
Wei-Lun Chao
56
20
0
13 Sep 2021
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Ming Yan
Haiyang Xu
Chenliang Li
Bin Bi
Junfeng Tian
Min Gui
Wei Wang
VLM
25
10
0
21 Aug 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Mohit Bansal
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
188
405
0
13 Jul 2021
Explanation-Based Human Debugging of NLP Models: A Survey
Explanation-Based Human Debugging of NLP Models: A Survey
Piyawat Lertvittayakumjorn
Francesca Toni
LRM
28
79
0
30 Apr 2021
Grounding Physical Concepts of Objects and Events Through Dynamic Visual
  Reasoning
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
Zhenfang Chen
Jiayuan Mao
Jiajun Wu
Kwan-Yee Kenneth Wong
J. Tenenbaum
Chuang Gan
VGen
31
92
0
30 Mar 2021
Automatic Generation of Contrast Sets from Scene Graphs: Probing the
  Compositional Consistency of GQA
Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA
Yonatan Bitton
Gabriel Stanovsky
Roy Schwartz
Michael Elhadad
CoGe
17
33
0
17 Mar 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
249
525
0
04 Feb 2021
Answer Questions with Right Image Regions: A Visual Attention
  Regularization Approach
Answer Questions with Right Image Regions: A Visual Attention Regularization Approach
Y. Liu
Yangyang Guo
Jianhua Yin
Xuemeng Song
Weifeng Liu
Liqiang Nie
24
28
0
03 Feb 2021
Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs
Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs
Antoni Rosinol
Andrew Violette
Marcus Abate
Nathan Hughes
Yun Chang
J. Shi
Arjun Gupta
Luca Carlone
3DV
23
220
0
18 Jan 2021
ORDNet: Capturing Omni-Range Dependencies for Scene Parsing
ORDNet: Capturing Omni-Range Dependencies for Scene Parsing
Shaofei Huang
Si Liu
Tianrui Hui
Jizhong Han
Bo-wen Li
Jiashi Feng
Shuicheng Yan
3DPC
OffRL
19
15
0
11 Jan 2021
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework
  of Vision-and-Language BERTs
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs
Emanuele Bugliarello
Ryan Cotterell
Naoaki Okazaki
Desmond Elliott
22
119
0
30 Nov 2020
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual
  Question Answering
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering
Aisha Urooj Khan
Amir Mazaheri
N. Lobo
M. Shah
24
56
0
27 Oct 2020
Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual
  Question Answering
Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering
Zihao Zhu
J. Yu
Yujing Wang
Yajing Sun
Yue Hu
Qi Wu
17
125
0
16 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai
Justin Johnson
SSL
VLM
19
432
0
11 Jun 2020
Probing Contextual Language Models for Common Ground with Visual
  Representations
Probing Contextual Language Models for Common Ground with Visual Representations
Gabriel Ilharco
Rowan Zellers
Ali Farhadi
Hannaneh Hajishirzi
22
14
0
01 May 2020
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
M. Farazi
Salman H. Khan
Nick Barnes
21
17
0
20 Jan 2020
A Review on Intelligent Object Perception Methods Combining
  Knowledge-based Reasoning and Machine Learning
A Review on Intelligent Object Perception Methods Combining Knowledge-based Reasoning and Machine Learning
Filippos Gouidis
Alexandros Vassiliades
T. Patkos
Antonis Argyros
Nick Bassiliades
Dimitris Plexousakis
OCL
29
12
0
26 Dec 2019
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning
  Baselines
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
A. Schwing
LRM
ReLM
26
9
0
31 Oct 2019
3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera
3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera
Iro Armeni
Zhi-Yang He
JunYoung Gwak
Amir Zamir
Martin Fischer
Jitendra Malik
Silvio Savarese
3DV
3DPC
28
336
0
06 Oct 2019
Compact Trilinear Interaction for Visual Question Answering
Compact Trilinear Interaction for Visual Question Answering
Tuong Khanh Long Do
Thanh-Toan Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
28
59
0
26 Sep 2019
Scene Graph Parsing by Attention Graph
Scene Graph Parsing by Attention Graph
Martin Andrews
Yew Ken Chia
Sam Witteveen
GNN
14
11
0
13 Sep 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
29
1,643
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Mohit Bansal
VLM
MLLM
55
2,447
0
20 Aug 2019
A Fast and Accurate One-Stage Approach to Visual Grounding
A Fast and Accurate One-Stage Approach to Visual Grounding
Zhengyuan Yang
Boqing Gong
Liwei Wang
Wenbing Huang
Dong Yu
Jiebo Luo
ObjD
12
360
0
18 Aug 2019
U-CAM: Visual Explanation using Uncertainty based Class Activation Maps
U-CAM: Visual Explanation using Uncertainty based Class Activation Maps
Badri N. Patro
Mayank Lunayach
Shivansh Patel
Vinay P. Namboodiri
FAtt
UQCV
19
76
0
17 Aug 2019
An Empirical Study on Leveraging Scene Graphs for Visual Question
  Answering
An Empirical Study on Leveraging Scene Graphs for Visual Question Answering
Cheng Zhang
Wei-Lun Chao
D. Xuan
23
50
0
28 Jul 2019
Adversarial Multimodal Network for Movie Question Answering
Zhaoquan Yuan
Siyuan Sun
Lixin Duan
Xiao Wu
Changsheng Xu
19
3
0
24 Jun 2019
TVQA+: Spatio-Temporal Grounding for Video Question Answering
TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei
Licheng Yu
Tamara L. Berg
Mohit Bansal
23
227
0
25 Apr 2019
Factor Graph Attention
Factor Graph Attention
Idan Schwartz
Seunghak Yu
Tamir Hazan
A. Schwing
19
110
0
11 Apr 2019
A Simple Baseline for Audio-Visual Scene-Aware Dialog
A Simple Baseline for Audio-Visual Scene-Aware Dialog
Idan Schwartz
A. Schwing
Tamir Hazan
19
69
0
11 Apr 2019
Reasoning Visual Dialogs with Structural and Partial Observations
Reasoning Visual Dialogs with Structural and Partial Observations
Zilong Zheng
Wenguan Wang
Siyuan Qi
Song-Chun Zhu
28
117
0
11 Apr 2019
Recent Advances in Natural Language Inference: A Survey of Benchmarks,
  Resources, and Approaches
Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches
Shane Storks
Qiaozi Gao
J. Chai
13
128
0
02 Apr 2019
Constructing Hierarchical Q&A Datasets for Video Story Understanding
Constructing Hierarchical Q&A Datasets for Video Story Understanding
Y. Heo
Kyoung-Woon On
Seong-Ho Choi
Jaeseo Lim
Jinah Kim
Jeh-Kwang Ryu
Byung-Chull Bae
Byoung-Tak Zhang
17
4
0
01 Apr 2019
Periphery-Fovea Multi-Resolution Driving Model guided by Human Attention
Periphery-Fovea Multi-Resolution Driving Model guided by Human Attention
Ye Xia
Jinkyu Kim
John F. Canny
K. Zipser
D. Whitney
16
51
0
24 Mar 2019
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing
Chi Zhang
Feng Gao
Baoxiong Jia
Yixin Zhu
Song-Chun Zhu
AIMat
14
303
0
07 Mar 2019
CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing
  Imagery
CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery
Gongjie Zhang
Shijian Lu
Wei Zhang
20
353
0
03 Mar 2019
Audio-Visual Scene-Aware Dialog
Audio-Visual Scene-Aware Dialog
Huda AlAmri
Vincent Cartillier
Abhishek Das
Jue Wang
A. Cherian
...
Tim K. Marks
Chiori Hori
Peter Anderson
Stefan Lee
Devi Parikh
VGen
17
188
0
25 Jan 2019
Adversarial Attacks on Deep Learning Models in Natural Language
  Processing: A Survey
Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey
W. Zhang
Quan Z. Sheng
A. Alhazmi
Chenliang Li
AAML
16
57
0
21 Jan 2019
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
Ning Xie
Farley Lai
Derek Doran
Asim Kadav
CoGe
31
321
0
20 Jan 2019
Using Deep Learning for price prediction by exploiting stationary limit
  order book features
Using Deep Learning for price prediction by exploiting stationary limit order book features
Avraam Tsantekidis
Nikolaos Passalis
Anastasios Tefas
J. Kanniainen
M. Gabbouj
Alexandros Iosifidis
OOD
13
88
0
23 Oct 2018
Overcoming Language Priors in Visual Question Answering with Adversarial
  Regularization
Overcoming Language Priors in Visual Question Answering with Adversarial Regularization
S. Ramakrishnan
Aishwarya Agrawal
Stefan Lee
AAML
20
235
0
08 Oct 2018
Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition
Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition
Jianwei Yang
Jiasen Lu
Stefan Lee
Dhruv Batra
Devi Parikh
6
42
0
01 Oct 2018
Interpretable Visual Question Answering by Reasoning on Dependency Trees
Interpretable Visual Question Answering by Reasoning on Dependency Trees
Qingxing Cao
Bailin Li
Xiaodan Liang
Liang Lin
20
55
0
06 Sep 2018
Defoiling Foiled Image Captions
Defoiling Foiled Image Captions
Pranava Madhyastha
Josiah Wang
Lucia Specia
22
9
0
16 May 2018
Did the Model Understand the Question?
Did the Model Understand the Question?
Pramod Kaushik Mudrakarta
Ankur Taly
Mukund Sundararajan
Kedar Dhamdhere
ELM
OOD
FAtt
27
196
0
14 May 2018
Unsupervised Textual Grounding: Linking Words to Image Concepts
Unsupervised Textual Grounding: Linking Words to Image Concepts
Raymond A. Yeh
Minh Do
A. Schwing
16
40
0
29 Mar 2018
Motion-Appearance Co-Memory Networks for Video Question Answering
Motion-Appearance Co-Memory Networks for Video Question Answering
J. Gao
Runzhou Ge
Kan Chen
Ram Nevatia
18
240
0
29 Mar 2018
Transparency by Design: Closing the Gap Between Performance and
  Interpretability in Visual Reasoning
Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning
David Mascharka
Philip Tran
Ryan Soklaski
Arjun Majumdar
31
207
0
14 Mar 2018
Previous
123
Next