Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1511.07571
Cited By
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
24 November 2015
Justin Johnson
A. Karpathy
Li Fei-Fei
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DenseCap: Fully Convolutional Localization Networks for Dense Captioning"
50 / 452 papers shown
Title
Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection
Xiaodan Liang
Lisa Lee
Eric P. Xing
13
250
0
08 Mar 2017
Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos
De-An Huang
Joseph J. Lim
Li Fei-Fei
Juan Carlos Niebles
14
56
0
07 Mar 2017
Visual Translation Embedding Network for Visual Relation Detection
Hanwang Zhang
Zawlin Kyaw
Shih-Fu Chang
Tat-Seng Chua
ViT
140
560
0
27 Feb 2017
ViP-CNN: Visual Phrase Guided Convolutional Neural Network
Yikang Li
Wanli Ouyang
Xiaogang Wang
Xiaoóu Tang
ObjD
14
48
0
23 Feb 2017
Person Search with Natural Language Description
Shuang Li
Tong Xiao
Hongsheng Li
Bolei Zhou
Dayu Yue
Xiaogang Wang
19
385
0
19 Feb 2017
Learning to Detect Human-Object Interactions
Yu-Wei Chao
Yunfan Liu
Michael Xieyang Liu
Huayi Zeng
Jia Deng
6
501
0
17 Feb 2017
Gated Multimodal Units for Information Fusion
John Arevalo
Thamar Solorio
M. Montes-y-Gómez
Fabio Gonzalez
16
370
0
07 Feb 2017
Concurrent Activity Recognition with Multimodal CNN-LSTM Structure
Xinyu Li
Yanyi Zhang
Jianyu Zhang
Shuhong Chen
I. Marsic
Richard A. Farneth
R. Burd
HAI
10
35
0
06 Feb 2017
Learning Word-Like Units from Joint Audio-Visual Analysis
David F. Harwath
James R. Glass
24
106
0
25 Jan 2017
Incremental Learning for Robot Perception through HRI
Sepehr Valipour
C. P. Quintero
Martin Jägersand
SSL
CLL
8
32
0
17 Jan 2017
Comprehension-guided referring expressions
Ruotian Luo
Gregory Shakhnarovich
ObjD
21
171
0
12 Jan 2017
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
Licheng Yu
Hao Tan
Mohit Bansal
Tamara L. Berg
ObjD
21
272
0
30 Dec 2016
Top-down Visual Saliency Guided by Captions
Vasili Ramanishka
Abir Das
Jianming Zhang
Kate Saenko
11
142
0
21 Dec 2016
An Empirical Study of Language CNN for Image Captioning
Jiuxiang Gu
G. Wang
Jianfei Cai
Tsuhan Chen
15
132
0
21 Dec 2016
Automatic Generation of Grounded Visual Questions
Shijie Zhang
Lizhen Qu
Shaodi You
Zhenglu Yang
Jiawan Zhang
OOD
11
79
0
20 Dec 2016
Sparse Factorization Layers for Neural Networks with Limited Supervision
Parker A. Koch
Jason J. Corso
13
2
0
14 Dec 2016
ImageNet pre-trained models with batch normalization
Marcel Simon
E. Rodner
Joachim Denzler
VLM
SSeg
28
165
0
05 Dec 2016
Multi-Label Image Classification with Regional Latent Semantic Dependencies
Junjie Zhang
Qi Wu
Chunhua Shen
Jian Andrew Zhang
Jianfeng Lu
17
165
0
04 Dec 2016
Areas of Attention for Image Captioning
M. Pedersoli
Thomas Lucas
Cordelia Schmid
Jakob Verbeek
25
205
0
03 Dec 2016
Training Bit Fully Convolutional Network for Fast Semantic Segmentation
He Wen
Shuchang Zhou
Zhe Liang
Yuxiang Zhang
Dieqiao Feng
Xinyu Zhou
Cong Yao
MQ
SSeg
29
10
0
01 Dec 2016
Modeling Relationships in Referential Expressions with Compositional Modular Networks
Ronghang Hu
Marcus Rohrbach
Jacob Andreas
Trevor Darrell
Kate Saenko
29
401
0
30 Nov 2016
Social Scene Understanding: End-to-End Multi-Person Action Localization and Collective Activity Recognition
Timur M. Bagautdinov
Alexandre Alahi
F. Fleuret
Pascal Fua
Silvio Savarese
19
217
0
28 Nov 2016
DeepSetNet: Predicting Sets with Deep Neural Networks
S. Hamid Rezatofighi
B. V. Kumar
Anton Milan
Ehsan Abbasnejad
A. Dick
Ian Reid
BDL
30
51
0
28 Nov 2016
Grad-CAM: Why did you say that?
Ramprasaath R. Selvaraju
Abhishek Das
Ramakrishna Vedantam
Michael Cogswell
Devi Parikh
Dhruv Batra
FAtt
15
462
0
22 Nov 2016
Sampled Image Tagging and Retrieval Methods on User Generated Content
Karl S. Ni
Kyle Zaragoza
Charles Foster
C. Carrano
Barry Y. Chen
Yonas Tesfaye
A. Gude
12
6
0
21 Nov 2016
Dense Captioning with Joint Inference and Visual Context
L. Yang
K. Tang
Jianchao Yang
Li-Jia Li
VLM
19
169
0
21 Nov 2016
Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues
Bryan A. Plummer
Arun Mallya
Christopher M. Cervantes
J. Hockenmaier
Svetlana Lazebnik
17
189
0
21 Nov 2016
A Hierarchical Approach for Generating Descriptive Image Paragraphs
J. Krause
Justin Johnson
Ranjay Krishna
Li Fei-Fei
VLM
19
373
0
20 Nov 2016
Recurrent Memory Addressing for describing videos
A. Jain
Abhinav Agarwalla
Kumar Krishna Agrawal
Pabitra Mitra
27
10
0
20 Nov 2016
Convolutional Gated Recurrent Networks for Video Segmentation
Mennatullah Siam
Sepehr Valipour
Martin Jägersand
Nilanjan Ray
VOS
17
98
0
16 Nov 2016
Diversity encouraged learning of unsupervised LSTM ensemble for neural activity video prediction
Yilin Song
J. Viventi
Yao Wang
AI4TS
28
2
0
15 Nov 2016
Zero-resource Machine Translation by Multimodal Encoder-decoder Network with Multimedia Pivot
Hideki Nakayama
Noriki Nishida
16
62
0
14 Nov 2016
Memory-augmented Attention Modelling for Videos
Rasool Fakoor
Abdel-rahman Mohamed
Margaret Mitchell
S. B. Kang
Pushmeet Kohli
35
20
0
07 Nov 2016
Spatio-Temporal Attention Models for Grounded Video Captioning
M. Zanfir
Elisabeta Marinoiu
C. Sminchisescu
27
50
0
17 Oct 2016
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Ramprasaath R. Selvaraju
Michael Cogswell
Abhishek Das
Ramakrishna Vedantam
Devi Parikh
Dhruv Batra
FAtt
18
19,529
0
07 Oct 2016
Visual Question Answering: Datasets, Algorithms, and Future Challenges
Kushal Kafle
Christopher Kanan
OOD
17
235
0
05 Oct 2016
Learning to generalize to new compositions in image understanding
Y. Atzmon
Jonathan Berant
Vahid Kezami
Amir Globerson
Gal Chechik
18
67
0
27 Aug 2016
Title Generation for User Generated Videos
Kuo-Hao Zeng
Tseng-Hung Chen
Juan Carlos Niebles
Min Sun
27
68
0
25 Aug 2016
Modeling Context Between Objects for Referring Expression Understanding
Varun K. Nagaraja
Vlad I. Morariu
Larry S. Davis
21
143
0
01 Aug 2016
Modeling Context in Referring Expressions
Licheng Yu
Patrick Poirson
Shan Yang
Alexander C. Berg
Tamara L. Berg
28
1,223
0
31 Jul 2016
Watch What You Just Said: Image Captioning with Text-Conditional Attention
Luowei Zhou
Chenliang Xu
Parker A. Koch
Jason J. Corso
VLM
6
44
0
15 Jun 2016
Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization
Spyridon Gidaris
N. Komodakis
ObjD
19
79
0
14 Jun 2016
Deep neural networks are robust to weight binarization and other non-linear distortions
P. Merolla
R. Appuswamy
John V. Arthur
S. K. Esser
D. Modha
OOD
MQ
23
96
0
07 Jun 2016
Recurrent Fully Convolutional Networks for Video Segmentation
Sepehr Valipour
Mennatullah Siam
Martin Jägersand
Nilanjan Ray
VOS
19
89
0
01 Jun 2016
Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition
Théodore Bluche
AI4TS
13
189
0
28 Apr 2016
Attributes as Semantic Units between Natural Language and Visual Recognition
Marcus Rohrbach
VLM
14
3
0
12 Apr 2016
Dense Image Representation with Spatial Pyramid VLAD Coding of CNN for Locally Robust Captioning
Andrew Shin
Masataka Yamaguchi
Katsunori Ohnishi
Tatsuya Harada
40
8
0
30 Mar 2016
Rich Image Captioning in the Wild
Kenneth Tran
Xiaodong He
Lei Zhang
Jian Sun
Cornelia Carapcea
Chris Thrasher
Chris Buehler
Chris Sienkiewicz
VLM
11
123
0
30 Mar 2016
BreakingNews: Article Annotation by Image and Text Processing
Arnau Ramisa
F. Yan
Francesc Moreno-Noguer
K. Mikolajczyk
21
105
0
23 Mar 2016
Generation and Comprehension of Unambiguous Object Descriptions
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
ObjD
14
1,309
0
07 Nov 2015
Previous
1
2
3
...
10
8
9
Next