Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1602.07332
Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li-Jia Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"
50 / 889 papers shown
Title
Probing Contextual Language Models for Common Ground with Visual Representations
Gabriel Ilharco
Rowan Zellers
Ali Farhadi
Hannaneh Hajishirzi
22
14
0
01 May 2020
VD-BERT: A Unified Vision and Dialog Transformer with BERT
Yue Wang
Shafiq R. Joty
Michael R. Lyu
Irwin King
Caiming Xiong
S. Hoi
24
102
0
28 Apr 2020
Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset
Menglin Jia
Mengyun Shi
Mikhail Sirotenko
Yin Cui
Claire Cardie
B. Hariharan
Hartwig Adam
Serge J. Belongie
19
92
0
26 Apr 2020
Experience Grounds Language
Yonatan Bisk
Ari Holtzman
Jesse Thomason
Jacob Andreas
Yoshua Bengio
...
Angeliki Lazaridou
Jonathan May
Aleksandr Nisnevich
Nicolas Pinto
Joseph P. Turian
19
350
0
21 Apr 2020
Optimistic Agent: Accurate Graph-Based Value Estimation for More Successful Visual Navigation
M. Moghaddam
Qi Wu
Ehsan Abbasnejad
Javen Qinfeng Shi
15
4
0
07 Apr 2020
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
Zhicheng Huang
Zhaoyang Zeng
Bei Liu
Dongmei Fu
Jianlong Fu
ViT
30
436
0
02 Apr 2020
Graph Structured Network for Image-Text Matching
Chunxiao Liu
Zhendong Mao
Tianzhu Zhang
Hongtao Xie
Bin Wang
Yongdong Zhang
17
232
0
01 Apr 2020
Spatio-Temporal Graph for Video Captioning with Knowledge Distillation
Boxiao Pan
Haoye Cai
De-An Huang
Kuan-Hui Lee
Adrien Gaidon
Ehsan Adeli
Juan Carlos Niebles
31
235
0
31 Mar 2020
Learning Object Permanence from Video
Aviv Shamsian
Ofri Kleinfeld
Amir Globerson
Gal Chechik
SSL
34
31
0
23 Mar 2020
Visual Question Answering for Cultural Heritage
P. Bongini
Federico Becattini
Andrew D. Bagdanov
A. Bimbo
173
22
0
22 Mar 2020
Affinity Graph Supervision for Visual Recognition
Chu Wang
Babak Samari
Vladimir G. Kim
S. Chaudhuri
Kaleem Siddiqi
GNN
19
8
0
19 Mar 2020
Deep Adaptive Semantic Logic (DASL): Compiling Declarative Knowledge into Deep Neural Networks
Karan Sikka
Andrew Silberfarb
John Byrnes
Indranil Sur
Edmond Chow
Ajay Divakaran
R. Rohwer
NAI
9
11
0
16 Mar 2020
A Study on Multimodal and Interactive Explanations for Visual Question Answering
Kamran Alipour
J. Schulze
Yi Yao
Avi Ziskind
Giedrius Burachas
24
27
0
01 Mar 2020
Unbiased Scene Graph Generation from Biased Training
Kaihua Tang
Yulei Niu
Jianqiang Huang
Jiaxin Shi
Hanwang Zhang
CML
22
680
0
27 Feb 2020
On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering
Xinyu Wang
Yuliang Liu
Chunhua Shen
Chun Chet Ng
Canjie Luo
Lianwen Jin
C. Chan
A. Hengel
Liangwei Wang
31
91
0
24 Feb 2020
Captioning Images Taken by People Who Are Blind
Danna Gurari
Yinan Zhao
Meng Zhang
Nilavra Bhattacharya
22
181
0
20 Feb 2020
Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN
Hang Xu
Linpu Fang
Xiaodan Liang
Wenxiong Kang
Zhenguo Li
ObjD
24
21
0
18 Feb 2020
Gaussian Smoothen Semantic Features (GSSF) -- Exploring the Linguistic Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO Framework
C. Sur
19
7
0
16 Feb 2020
MRRC: Multiple Role Representation Crossover Interpretation for Image Captioning With R-CNN Feature Distribution Composition (FDC)
C. Sur
25
16
0
15 Feb 2020
Object Detection as a Positive-Unlabeled Problem
Yuewei Yang
Kevin J Liang
Lawrence Carin
19
37
0
11 Feb 2020
Controlling generative models with continuous factors of variations
Antoine Plumerault
Hervé Le Borgne
C´eline Hudelot
DRL
24
126
0
28 Jan 2020
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
Di Qi
Lin Su
Jianwei Song
Edward Cui
Taroon Bharti
Arun Sacheti
VLM
29
258
0
22 Jan 2020
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
M. Farazi
Salman H. Khan
Nick Barnes
23
17
0
20 Jan 2020
Show, Recall, and Tell: Image Captioning with Recall Mechanism
Li Wang
Zechen Bai
Yonghua Zhang
Hongtao Lu
24
67
0
15 Jan 2020
Cross-dataset Training for Class Increasing Object Detection
Yongqiang Yao
Yan Wang
Yu-Xiao Guo
Jiaojiao Lin
Hongwei Qin
Junjie Yan
ObjD
24
17
0
14 Jan 2020
Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning
Han-Jia Ye
Hong-You Chen
De-Chuan Zhan
Wei-Lun Chao
24
99
0
06 Jan 2020
Personalizing Fast-Forward Videos Based on Visual and Textual Features from Social Network
W. Ramos
M. Silva
Edson Roteia Araujo Junior
Alan C. Neves
Erickson R. Nascimento
14
6
0
29 Dec 2019
A Review on Intelligent Object Perception Methods Combining Knowledge-based Reasoning and Machine Learning
Filippos Gouidis
Alexandros Vassiliades
T. Patkos
Antonis Argyros
Nick Bassiliades
Dimitris Plexousakis
OCL
29
12
0
26 Dec 2019
Look, Read and Feel: Benchmarking Ads Understanding with Multimodal Multitask Learning
Huaizheng Zhang
Yong Luo
Qiming Ai
Yonggang Wen
17
15
0
21 Dec 2019
Meshed-Memory Transformer for Image Captioning
Marcella Cornia
Matteo Stefanini
Lorenzo Baraldi
Rita Cucchiara
14
868
0
17 Dec 2019
Action Genome: Actions as Composition of Spatio-temporal Scene Graphs
Jingwei Ji
Ranjay Krishna
Li Fei-Fei
Juan Carlos Niebles
39
335
0
15 Dec 2019
A Real-time Global Inference Network for One-stage Referring Expression Comprehension
Yiyi Zhou
Rongrong Ji
Gen Luo
Xiaoshuai Sun
Jinsong Su
Xinghao Ding
Chia-Wen Lin
Q. Tian
ObjD
24
60
0
07 Dec 2019
Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
19
14
0
06 Dec 2019
SOGNet: Scene Overlap Graph Network for Panoptic Segmentation
Yibo Yang
Hongyang Li
Xia Li
Qijie Zhao
Jianlong Wu
Zhouchen Lin
ISeg
19
62
0
18 Nov 2019
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
A. Schwing
LRM
ReLM
31
9
0
31 Oct 2019
Identifying Unknown Instances for Autonomous Driving
K. Wong
Shenlong Wang
Mengye Ren
Ming Liang
R. Urtasun
19
110
0
24 Oct 2019
Depth-wise Decomposition for Accelerating Separable Convolutions in Efficient Convolutional Neural Networks
Yihui He
Jianing Qian
Jianren Wang
Cindy X. Le
Congrui Hetang
Qi Lyu
Wenping Wang
Tianwei Yue
40
11
0
21 Oct 2019
Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
Sijin Wang
Ruiping Wang
Ziwei Yao
Shiguang Shan
Xilin Chen
3DV
28
208
0
11 Oct 2019
3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera
Iro Armeni
Zhi-Yang He
JunYoung Gwak
Amir Zamir
Martin Fischer
Jitendra Malik
Silvio Savarese
3DV
3DPC
30
336
0
06 Oct 2019
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations
Po-Yao (Bernie) Huang
Xiaojun Chang
Alexander G. Hauptmann
22
25
0
30 Sep 2019
Synthetic Data for Deep Learning
Sergey I. Nikolenko
46
348
0
25 Sep 2019
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators
Kuang-Huei Lee
Hamid Palangi
Xi Chen
Houdong Hu
Jianfeng Gao
VLM
19
37
0
22 Sep 2019
Triplet-Aware Scene Graph Embeddings
Brigit Schroeder
Subarna Tripathi
Hanlin Tang
3DPC
25
16
0
19 Sep 2019
Scene Graph Parsing by Attention Graph
Martin Andrews
Yew Ken Chia
Sam Witteveen
GNN
22
11
0
13 Sep 2019
Specifying Object Attributes and Relations in Interactive Scene Generation
Oron Ashual
Lior Wolf
25
178
0
11 Sep 2019
Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation
Arijit Ray
Karan Sikka
Ajay Divakaran
Stefan Lee
Giedrius Burachas
19
65
0
10 Sep 2019
Explainable Video Action Reasoning via Prior Knowledge and State Transitions
Tao Zhuo
Zhiyong Cheng
Peng Zhang
Yongkang Wong
Mohan S. Kankanhalli
FAtt
25
60
0
28 Aug 2019
Situational Fusion of Visual Representation for Visual Navigation
Bokui (William) Shen
Danfei Xu
Yuke Zhu
Leonidas J. Guibas
Fei-Fei Li
Silvio Savarese
SSL
22
62
0
24 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
29
1,649
0
22 Aug 2019
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets
Mor Geva
Yoav Goldberg
Jonathan Berant
242
320
0
21 Aug 2019
Previous
1
2
3
...
14
15
16
17
18
Next