ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.07332
  4. Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li-Jia Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
ArXivPDFHTML

Papers citing "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"

50 / 903 papers shown
Title
Sunny and Dark Outside?! Improving Answer Consistency in VQA through
  Entailed Question Generation
Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation
Arijit Ray
Karan Sikka
Ajay Divakaran
Stefan Lee
Giedrius Burachas
19
65
0
10 Sep 2019
Explainable Video Action Reasoning via Prior Knowledge and State
  Transitions
Explainable Video Action Reasoning via Prior Knowledge and State Transitions
Tao Zhuo
Zhiyong Cheng
Peng Zhang
Yongkang Wong
Mohan S. Kankanhalli
FAtt
25
60
0
28 Aug 2019
Situational Fusion of Visual Representation for Visual Navigation
Situational Fusion of Visual Representation for Visual Navigation
Bokui (William) Shen
Danfei Xu
Yuke Zhu
Leonidas J. Guibas
Fei-Fei Li
Silvio Savarese
SSL
22
62
0
24 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
29
1,649
0
22 Aug 2019
Are We Modeling the Task or the Annotator? An Investigation of Annotator
  Bias in Natural Language Understanding Datasets
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets
Mor Geva
Yoav Goldberg
Jonathan Berant
242
320
0
21 Aug 2019
Phrase Localization Without Paired Training Examples
Phrase Localization Without Paired Training Examples
Josiah Wang
Lucia Specia
24
41
0
20 Aug 2019
Image Synthesis From Reconfigurable Layout and Style
Image Synthesis From Reconfigurable Layout and Style
Wei Sun
Tianfu Wu
27
140
0
20 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Mohit Bansal
VLM
MLLM
58
2,450
0
20 Aug 2019
Proposal-free Temporal Moment Localization of a Natural-Language Query
  in Video using Guided Attention
Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention
Cristian Rodriguez-Opazo
Edison Marrese-Taylor
F. Saleh
Hongdong Li
Stephen Gould
16
147
0
20 Aug 2019
Zero-Shot Grounding of Objects from Natural Language Queries
Zero-Shot Grounding of Objects from Natural Language Queries
Arka Sadhu
Kan Chen
Ram Nevatia
ObjD
30
156
0
20 Aug 2019
Unpaired Image-to-Speech Synthesis with Multimodal Information
  Bottleneck
Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck
Shuang Ma
Daniel J. McDuff
Yale Song
20
22
0
19 Aug 2019
Attention on Attention for Image Captioning
Attention on Attention for Image Captioning
Lun Huang
Wenmin Wang
Jie Chen
Xiao-Yong Wei
24
823
0
19 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal
  Pre-training
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
57
895
0
16 Aug 2019
Unpaired Cross-lingual Image Caption Generation with Self-Supervised
  Rewards
Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards
Yuqing Song
Shizhe Chen
Yida Zhao
Qin Jin
SSL
21
40
0
15 Aug 2019
Multimodal Unified Attention Networks for Vision-and-Language
  Interactions
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
19
38
0
12 Aug 2019
Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking
Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking
Tan Wang
Xing Xu
Yang Yang
Alan Hanjalic
Heng Tao Shen
Jingkuan Song
22
145
0
12 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
35
1,912
0
09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
67
3,621
0
06 Aug 2019
Aligning Linguistic Words and Visual Semantic Units for Image Captioning
Aligning Linguistic Words and Visual Semantic Units for Image Captioning
Longteng Guo
Jing Liu
Jinhui Tang
Jiangwei Li
W. Luo
Hanqing Lu
17
102
0
06 Aug 2019
Cascaded Revision Network for Novel Object Captioning
Cascaded Revision Network for Novel Object Captioning
Qianyu Feng
Yu Wu
Hehe Fan
C. Yan
Yezhou Yang
24
35
0
06 Aug 2019
Convolutional Auto-encoding of Sentence Topics for Image Paragraph
  Generation
Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation
Jing Wang
Yingwei Pan
Ting Yao
Jinhui Tang
Tao Mei
VLM
BDL
DiffM
19
36
0
01 Aug 2019
An Empirical Study on Leveraging Scene Graphs for Visual Question
  Answering
An Empirical Study on Leveraging Scene Graphs for Visual Question Answering
Cheng Zhang
Wei-Lun Chao
D. Xuan
23
50
0
28 Jul 2019
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine
  Translation
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation
Shantipriya Parida
Ondrej Bojar
S. Dash
25
62
0
21 Jul 2019
CraftAssist: A Framework for Dialogue-enabled Interactive Agents
CraftAssist: A Framework for Dialogue-enabled Interactive Agents
Jonathan Gray
Kavya Srinet
Yacine Jernite
Haonan Yu
Zhuoyuan Chen
Demi Guo
Siddharth Goyal
C. L. Zitnick
Arthur Szlam
28
38
0
19 Jul 2019
Baidu-UTS Submission to the EPIC-Kitchens Action Recognition Challenge
  2019
Baidu-UTS Submission to the EPIC-Kitchens Action Recognition Challenge 2019
Xiaohan Wang
Yu Wu
Linchao Zhu
Yi Yang
16
19
0
22 Jun 2019
Does Learning Require Memorization? A Short Tale about a Long Tail
Does Learning Require Memorization? A Short Tale about a Long Tail
Vitaly Feldman
TDI
21
481
0
12 Jun 2019
Context-Aware Visual Policy Network for Fine-Grained Image Captioning
Context-Aware Visual Policy Network for Fine-Grained Image Captioning
Zhengjun Zha
Daqing Liu
Hanwang Zhang
Yongdong Zhang
Feng Wu
25
119
0
06 Jun 2019
Relational Reasoning using Prior Knowledge for Visual Captioning
Relational Reasoning using Prior Knowledge for Visual Captioning
Jingyi Hou
Xinxiao Wu
Yayun Qi
Wentian Zhao
Jiebo Luo
Yunde Jia
17
14
0
04 Jun 2019
Scene Text Visual Question Answering
Scene Text Visual Question Answering
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Ernest Valveny
C. V. Jawahar
Dimosthenis Karatzas
25
343
0
31 May 2019
Contextual Translation Embedding for Visual Relationship Detection and
  Scene Graph Generation
Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation
Zih-Siou Hung
Arun Mallya
Svetlana Lazebnik
ViT
26
14
0
28 May 2019
Self-Critical Reasoning for Robust Visual Question Answering
Self-Critical Reasoning for Robust Visual Question Answering
Jialin Wu
Raymond J. Mooney
OOD
NAI
24
159
0
24 May 2019
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image
  Representations
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations
Fenglin Liu
Yuanxin Liu
Xuancheng Ren
Xiaodong He
Xu Sun
VLM
28
81
0
15 May 2019
Language-Conditioned Graph Networks for Relational Reasoning
Language-Conditioned Graph Networks for Relational Reasoning
Ronghang Hu
Anna Rohrbach
Trevor Darrell
Kate Saenko
23
171
0
10 May 2019
PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph
PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph
Yikang Li
Tao Ma
Yeqi Bai
Nan Duan
Sining Wei
Xiaogang Wang
20
93
0
05 May 2019
On Exploring Undetermined Relationships for Visual Relationship
  Detection
On Exploring Undetermined Relationships for Visual Relationship Detection
Yibing Zhan
Jun-chen Yu
Ting Yu
Dacheng Tao
23
81
0
05 May 2019
Scene Graph Prediction with Limited Labels
Scene Graph Prediction with Limited Labels
V. Chen
P. Varma
Ranjay Krishna
Michael S. Bernstein
Christopher Ré
Li Fei-Fei
8
86
0
25 Apr 2019
TVQA+: Spatio-Temporal Grounding for Video Question Answering
TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei
Licheng Yu
Tamara L. Berg
Mohit Bansal
31
227
0
25 Apr 2019
Deep Metric Learning Beyond Binary Supervision
Deep Metric Learning Beyond Binary Supervision
Sungyeon Kim
Minkyo Seo
Ivan Laptev
Minsu Cho
Suha Kwak
SSL
15
94
0
21 Apr 2019
Learning to Collocate Neural Modules for Image Captioning
Learning to Collocate Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Jianfei Cai
17
77
0
18 Apr 2019
Natural Language Semantics With Pictures: Some Language & Vision
  Datasets and Potential Uses for Computational Semantics
Natural Language Semantics With Pictures: Some Language & Vision Datasets and Potential Uses for Computational Semantics
David Schlangen
25
6
0
15 Apr 2019
Learning to Generate Unambiguous Spatial Referring Expressions for
  Real-World Environments
Learning to Generate Unambiguous Spatial Referring Expressions for Real-World Environments
Fethiye Irmak Dogan
Sinan Kalkan
Iolanda Leite
18
19
0
15 Apr 2019
ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors
ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors
Weicheng Kuo
A. Angelova
Jitendra Malik
Tsung-Yi Lin
3DPC
ISeg
24
117
0
05 Apr 2019
Context and Attribute Grounded Dense Captioning
Context and Attribute Grounded Dense Captioning
Guojun Yin
Lu Sheng
Bin Liu
Nenghai Yu
Xiaogang Wang
Jing Shao
16
75
0
02 Apr 2019
MMKG: Multi-Modal Knowledge Graphs
MMKG: Multi-Modal Knowledge Graphs
Ye Liu
Hui Li
Alberto García-Durán
Mathias Niepert
Daniel Oñoro-Rubio
David S. Rosenblum
16
193
0
13 Mar 2019
Visual Semantic Information Pursuit: A Survey
Visual Semantic Information Pursuit: A Survey
Daqi Liu
M. Bober
J. Kittler
15
31
0
13 Mar 2019
Knowledge-Embedded Routing Network for Scene Graph Generation
Knowledge-Embedded Routing Network for Scene Graph Generation
Tianshui Chen
Weihao Yu
Riquan Chen
Liang Lin
GNN
32
371
0
08 Mar 2019
Answer Them All! Toward Universal Visual Question Answering Models
Answer Them All! Toward Universal Visual Question Answering Models
Robik Shrestha
Kushal Kafle
Christopher Kanan
17
82
0
01 Mar 2019
MUREL: Multimodal Relational Reasoning for Visual Question Answering
MUREL: Multimodal Relational Reasoning for Visual Question Answering
Rémi Cadène
H. Ben-younes
Matthieu Cord
Nicolas Thome
LRM
19
271
0
25 Feb 2019
Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog
Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog
Zhe Gan
Yu Cheng
Ahmed El Kholy
Linjie Li
Jingjing Liu
Jianfeng Gao
11
104
0
01 Feb 2019
Adversarial Adaptation of Scene Graph Models for Understanding Civic
  Issues
Adversarial Adaptation of Scene Graph Models for Understanding Civic Issues
Shanu Kumar
Shubham Atreja
Anjali Singh
Mohit Jain
14
12
0
29 Jan 2019
Previous
123...1516171819
Next