ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.07571
  4. Cited By
DenseCap: Fully Convolutional Localization Networks for Dense Captioning

DenseCap: Fully Convolutional Localization Networks for Dense Captioning

24 November 2015
Justin Johnson
A. Karpathy
Li Fei-Fei
    VLM
ArXivPDFHTML

Papers citing "DenseCap: Fully Convolutional Localization Networks for Dense Captioning"

50 / 452 papers shown
Title
Deep Variation-structured Reinforcement Learning for Visual Relationship
  and Attribute Detection
Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection
Xiaodan Liang
Lisa Lee
Eric P. Xing
13
250
0
08 Mar 2017
Unsupervised Visual-Linguistic Reference Resolution in Instructional
  Videos
Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos
De-An Huang
Joseph J. Lim
Li Fei-Fei
Juan Carlos Niebles
14
56
0
07 Mar 2017
Visual Translation Embedding Network for Visual Relation Detection
Visual Translation Embedding Network for Visual Relation Detection
Hanwang Zhang
Zawlin Kyaw
Shih-Fu Chang
Tat-Seng Chua
ViT
140
560
0
27 Feb 2017
ViP-CNN: Visual Phrase Guided Convolutional Neural Network
ViP-CNN: Visual Phrase Guided Convolutional Neural Network
Yikang Li
Wanli Ouyang
Xiaogang Wang
Xiaoóu Tang
ObjD
14
48
0
23 Feb 2017
Person Search with Natural Language Description
Person Search with Natural Language Description
Shuang Li
Tong Xiao
Hongsheng Li
Bolei Zhou
Dayu Yue
Xiaogang Wang
19
385
0
19 Feb 2017
Learning to Detect Human-Object Interactions
Learning to Detect Human-Object Interactions
Yu-Wei Chao
Yunfan Liu
Michael Xieyang Liu
Huayi Zeng
Jia Deng
6
501
0
17 Feb 2017
Gated Multimodal Units for Information Fusion
Gated Multimodal Units for Information Fusion
John Arevalo
Thamar Solorio
M. Montes-y-Gómez
Fabio Gonzalez
16
370
0
07 Feb 2017
Concurrent Activity Recognition with Multimodal CNN-LSTM Structure
Concurrent Activity Recognition with Multimodal CNN-LSTM Structure
Xinyu Li
Yanyi Zhang
Jianyu Zhang
Shuhong Chen
I. Marsic
Richard A. Farneth
R. Burd
HAI
10
35
0
06 Feb 2017
Learning Word-Like Units from Joint Audio-Visual Analysis
Learning Word-Like Units from Joint Audio-Visual Analysis
David F. Harwath
James R. Glass
24
106
0
25 Jan 2017
Incremental Learning for Robot Perception through HRI
Incremental Learning for Robot Perception through HRI
Sepehr Valipour
C. P. Quintero
Martin Jägersand
SSL
CLL
8
32
0
17 Jan 2017
Comprehension-guided referring expressions
Comprehension-guided referring expressions
Ruotian Luo
Gregory Shakhnarovich
ObjD
21
171
0
12 Jan 2017
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
Licheng Yu
Hao Tan
Mohit Bansal
Tamara L. Berg
ObjD
21
272
0
30 Dec 2016
Top-down Visual Saliency Guided by Captions
Top-down Visual Saliency Guided by Captions
Vasili Ramanishka
Abir Das
Jianming Zhang
Kate Saenko
11
142
0
21 Dec 2016
An Empirical Study of Language CNN for Image Captioning
An Empirical Study of Language CNN for Image Captioning
Jiuxiang Gu
G. Wang
Jianfei Cai
Tsuhan Chen
15
132
0
21 Dec 2016
Automatic Generation of Grounded Visual Questions
Automatic Generation of Grounded Visual Questions
Shijie Zhang
Lizhen Qu
Shaodi You
Zhenglu Yang
Jiawan Zhang
OOD
11
79
0
20 Dec 2016
Sparse Factorization Layers for Neural Networks with Limited Supervision
Sparse Factorization Layers for Neural Networks with Limited Supervision
Parker A. Koch
Jason J. Corso
13
2
0
14 Dec 2016
ImageNet pre-trained models with batch normalization
ImageNet pre-trained models with batch normalization
Marcel Simon
E. Rodner
Joachim Denzler
VLM
SSeg
28
165
0
05 Dec 2016
Multi-Label Image Classification with Regional Latent Semantic
  Dependencies
Multi-Label Image Classification with Regional Latent Semantic Dependencies
Junjie Zhang
Qi Wu
Chunhua Shen
Jian Andrew Zhang
Jianfeng Lu
17
165
0
04 Dec 2016
Areas of Attention for Image Captioning
Areas of Attention for Image Captioning
M. Pedersoli
Thomas Lucas
Cordelia Schmid
Jakob Verbeek
25
205
0
03 Dec 2016
Training Bit Fully Convolutional Network for Fast Semantic Segmentation
Training Bit Fully Convolutional Network for Fast Semantic Segmentation
He Wen
Shuchang Zhou
Zhe Liang
Yuxiang Zhang
Dieqiao Feng
Xinyu Zhou
Cong Yao
MQ
SSeg
29
10
0
01 Dec 2016
Modeling Relationships in Referential Expressions with Compositional
  Modular Networks
Modeling Relationships in Referential Expressions with Compositional Modular Networks
Ronghang Hu
Marcus Rohrbach
Jacob Andreas
Trevor Darrell
Kate Saenko
29
401
0
30 Nov 2016
Social Scene Understanding: End-to-End Multi-Person Action Localization
  and Collective Activity Recognition
Social Scene Understanding: End-to-End Multi-Person Action Localization and Collective Activity Recognition
Timur M. Bagautdinov
Alexandre Alahi
F. Fleuret
Pascal Fua
Silvio Savarese
19
217
0
28 Nov 2016
DeepSetNet: Predicting Sets with Deep Neural Networks
DeepSetNet: Predicting Sets with Deep Neural Networks
S. Hamid Rezatofighi
B. V. Kumar
Anton Milan
Ehsan Abbasnejad
A. Dick
Ian Reid
BDL
30
51
0
28 Nov 2016
Grad-CAM: Why did you say that?
Grad-CAM: Why did you say that?
Ramprasaath R. Selvaraju
Abhishek Das
Ramakrishna Vedantam
Michael Cogswell
Devi Parikh
Dhruv Batra
FAtt
15
462
0
22 Nov 2016
Sampled Image Tagging and Retrieval Methods on User Generated Content
Sampled Image Tagging and Retrieval Methods on User Generated Content
Karl S. Ni
Kyle Zaragoza
Charles Foster
C. Carrano
Barry Y. Chen
Yonas Tesfaye
A. Gude
12
6
0
21 Nov 2016
Dense Captioning with Joint Inference and Visual Context
Dense Captioning with Joint Inference and Visual Context
L. Yang
K. Tang
Jianchao Yang
Li-Jia Li
VLM
19
169
0
21 Nov 2016
Phrase Localization and Visual Relationship Detection with Comprehensive
  Image-Language Cues
Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues
Bryan A. Plummer
Arun Mallya
Christopher M. Cervantes
J. Hockenmaier
Svetlana Lazebnik
17
189
0
21 Nov 2016
A Hierarchical Approach for Generating Descriptive Image Paragraphs
A Hierarchical Approach for Generating Descriptive Image Paragraphs
J. Krause
Justin Johnson
Ranjay Krishna
Li Fei-Fei
VLM
19
373
0
20 Nov 2016
Recurrent Memory Addressing for describing videos
Recurrent Memory Addressing for describing videos
A. Jain
Abhinav Agarwalla
Kumar Krishna Agrawal
Pabitra Mitra
27
10
0
20 Nov 2016
Convolutional Gated Recurrent Networks for Video Segmentation
Convolutional Gated Recurrent Networks for Video Segmentation
Mennatullah Siam
Sepehr Valipour
Martin Jägersand
Nilanjan Ray
VOS
17
98
0
16 Nov 2016
Diversity encouraged learning of unsupervised LSTM ensemble for neural
  activity video prediction
Diversity encouraged learning of unsupervised LSTM ensemble for neural activity video prediction
Yilin Song
J. Viventi
Yao Wang
AI4TS
28
2
0
15 Nov 2016
Zero-resource Machine Translation by Multimodal Encoder-decoder Network
  with Multimedia Pivot
Zero-resource Machine Translation by Multimodal Encoder-decoder Network with Multimedia Pivot
Hideki Nakayama
Noriki Nishida
16
62
0
14 Nov 2016
Memory-augmented Attention Modelling for Videos
Memory-augmented Attention Modelling for Videos
Rasool Fakoor
Abdel-rahman Mohamed
Margaret Mitchell
S. B. Kang
Pushmeet Kohli
35
20
0
07 Nov 2016
Spatio-Temporal Attention Models for Grounded Video Captioning
Spatio-Temporal Attention Models for Grounded Video Captioning
M. Zanfir
Elisabeta Marinoiu
C. Sminchisescu
27
50
0
17 Oct 2016
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based
  Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Ramprasaath R. Selvaraju
Michael Cogswell
Abhishek Das
Ramakrishna Vedantam
Devi Parikh
Dhruv Batra
FAtt
18
19,529
0
07 Oct 2016
Visual Question Answering: Datasets, Algorithms, and Future Challenges
Visual Question Answering: Datasets, Algorithms, and Future Challenges
Kushal Kafle
Christopher Kanan
OOD
17
235
0
05 Oct 2016
Learning to generalize to new compositions in image understanding
Learning to generalize to new compositions in image understanding
Y. Atzmon
Jonathan Berant
Vahid Kezami
Amir Globerson
Gal Chechik
18
67
0
27 Aug 2016
Title Generation for User Generated Videos
Title Generation for User Generated Videos
Kuo-Hao Zeng
Tseng-Hung Chen
Juan Carlos Niebles
Min Sun
27
68
0
25 Aug 2016
Modeling Context Between Objects for Referring Expression Understanding
Modeling Context Between Objects for Referring Expression Understanding
Varun K. Nagaraja
Vlad I. Morariu
Larry S. Davis
21
143
0
01 Aug 2016
Modeling Context in Referring Expressions
Modeling Context in Referring Expressions
Licheng Yu
Patrick Poirson
Shan Yang
Alexander C. Berg
Tamara L. Berg
28
1,223
0
31 Jul 2016
Watch What You Just Said: Image Captioning with Text-Conditional
  Attention
Watch What You Just Said: Image Captioning with Text-Conditional Attention
Luowei Zhou
Chenliang Xu
Parker A. Koch
Jason J. Corso
VLM
6
44
0
15 Jun 2016
Attend Refine Repeat: Active Box Proposal Generation via In-Out
  Localization
Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization
Spyridon Gidaris
N. Komodakis
ObjD
19
79
0
14 Jun 2016
Deep neural networks are robust to weight binarization and other
  non-linear distortions
Deep neural networks are robust to weight binarization and other non-linear distortions
P. Merolla
R. Appuswamy
John V. Arthur
S. K. Esser
D. Modha
OOD
MQ
23
96
0
07 Jun 2016
Recurrent Fully Convolutional Networks for Video Segmentation
Recurrent Fully Convolutional Networks for Video Segmentation
Sepehr Valipour
Mennatullah Siam
Martin Jägersand
Nilanjan Ray
VOS
19
89
0
01 Jun 2016
Joint Line Segmentation and Transcription for End-to-End Handwritten
  Paragraph Recognition
Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition
Théodore Bluche
AI4TS
13
189
0
28 Apr 2016
Attributes as Semantic Units between Natural Language and Visual
  Recognition
Attributes as Semantic Units between Natural Language and Visual Recognition
Marcus Rohrbach
VLM
14
3
0
12 Apr 2016
Dense Image Representation with Spatial Pyramid VLAD Coding of CNN for
  Locally Robust Captioning
Dense Image Representation with Spatial Pyramid VLAD Coding of CNN for Locally Robust Captioning
Andrew Shin
Masataka Yamaguchi
Katsunori Ohnishi
Tatsuya Harada
40
8
0
30 Mar 2016
Rich Image Captioning in the Wild
Rich Image Captioning in the Wild
Kenneth Tran
Xiaodong He
Lei Zhang
Jian Sun
Cornelia Carapcea
Chris Thrasher
Chris Buehler
Chris Sienkiewicz
VLM
11
123
0
30 Mar 2016
BreakingNews: Article Annotation by Image and Text Processing
BreakingNews: Article Annotation by Image and Text Processing
Arnau Ramisa
F. Yan
Francesc Moreno-Noguer
K. Mikolajczyk
21
105
0
23 Mar 2016
Generation and Comprehension of Unambiguous Object Descriptions
Generation and Comprehension of Unambiguous Object Descriptions
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
ObjD
14
1,309
0
07 Nov 2015
Previous
123...1089
Next