ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.07571
  4. Cited By
DenseCap: Fully Convolutional Localization Networks for Dense Captioning

DenseCap: Fully Convolutional Localization Networks for Dense Captioning

24 November 2015
Justin Johnson
A. Karpathy
Li Fei-Fei
    VLM
ArXiv (abs)PDFHTML

Papers citing "DenseCap: Fully Convolutional Localization Networks for Dense Captioning"

50 / 468 papers shown
Title
Spatial Memory for Context Reasoning in Object Detection
Spatial Memory for Context Reasoning in Object Detection
Xinlei Chen
Abhinav Gupta
ObjD
192
170
0
13 Apr 2017
Discriminative Bimodal Networks for Visual Localization and Detection
  with Natural Language Queries
Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries
Xicheng Zhang
Luyao Yuan
Yijie Guo
Zhiyuan He
I-An Huang
Honglak Lee
ObjD
152
59
0
12 Apr 2017
Deep Reinforcement Learning-based Image Captioning with Embedding Reward
Deep Reinforcement Learning-based Image Captioning with Embedding Reward
Zhou Ren
Xiaoyu Wang
Ning Zhang
Xutao Lv
Li Li
138
333
0
12 Apr 2017
What's in a Question: Using Visual Questions as a Form of Supervision
What's in a Question: Using Visual Questions as a Form of Supervision
Siddha Ganju
Olga Russakovsky
Abhinav Gupta
169
16
0
12 Apr 2017
Creativity: Generating Diverse Questions using Variational Autoencoders
Creativity: Generating Diverse Questions using Variational Autoencoders
Unnat Jain
Ziyu Zhang
Alex Schwing
163
157
0
11 Apr 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang
Yin Li
Jing-ling Huang
Svetlana Lazebnik
VLM
205
530
0
11 Apr 2017
Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question
  Answering
Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering
V. Kazemi
Ali Elqursh
OOD
148
193
0
11 Apr 2017
Generating Descriptions with Grounded and Co-Referenced People
Generating Descriptions with Grounded and Co-Referenced People
Anna Rohrbach
Marcus Rohrbach
Siyu Tang
Seong Joon Oh
Bernt Schiele
556
72
0
05 Apr 2017
Weakly Supervised Dense Video Captioning
Weakly Supervised Dense Video Captioning
Zhiqiang Shen
Jianguo Li
Zhou Su
Minjun Li
Yurong Chen
Yu-Gang Jiang
Xiangyang Xue
183
140
0
05 Apr 2017
Aligned Image-Word Representations Improve Inductive Transfer Across
  Vision-Language Tasks
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
Tanmay Gupta
Kevin J. Shih
Saurabh Singh
Derek Hoiem
253
26
0
02 Apr 2017
Interpretable Learning for Self-Driving Cars by Visualizing Causal
  Attention
Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention
Jinkyu Kim
John F. Canny
FAttXAIOODMILMCML
190
352
0
30 Mar 2017
Survey of the State of the Art in Natural Language Generation: Core
  tasks, applications and evaluation
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
Albert Gatt
E. Krahmer
LM&MAELM
381
872
0
29 Mar 2017
Neural Ctrl-F: Segmentation-free Query-by-String Word Spotting in
  Handwritten Manuscript Collections
Neural Ctrl-F: Segmentation-free Query-by-String Word Spotting in Handwritten Manuscript Collections
T. Wilkinson
Jonas Lindström
Anders Brun
128
39
0
22 Mar 2017
An End-to-End Approach to Natural Language Object Retrieval via
  Context-Aware Deep Reinforcement Learning
An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep Reinforcement Learning
Fan Wu
Zhongwen Xu
Yi Yang
ObjD
112
11
0
22 Mar 2017
Recurrent Topic-Transition GAN for Visual Paragraph Generation
Recurrent Topic-Transition GAN for Visual Paragraph Generation
Xiaodan Liang
Zhiting Hu
Huatian Zhang
Chuang Gan
Eric Xing
GAN
177
213
0
21 Mar 2017
Learning Cooperative Visual Dialog Agents with Deep Reinforcement
  Learning
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
Abhishek Das
Satwik Kottur
J. M. F. Moura
Stefan Lee
Dhruv Batra
OffRL
298
429
0
20 Mar 2017
Deep Variation-structured Reinforcement Learning for Visual Relationship
  and Attribute Detection
Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection
Xiaodan Liang
Lisa Lee
Eric Xing
233
256
0
08 Mar 2017
Unsupervised Visual-Linguistic Reference Resolution in Instructional
  Videos
Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos
De-An Huang
Joseph J. Lim
Li Fei-Fei
Juan Carlos Niebles
170
55
0
07 Mar 2017
Visual Translation Embedding Network for Visual Relation Detection
Visual Translation Embedding Network for Visual Relation DetectionComputer Vision and Pattern Recognition (CVPR), 2017
Hanwang Zhang
Zawlin Kyaw
Shih-Fu Chang
Tat-Seng Chua
ViT
351
583
0
27 Feb 2017
ViP-CNN: Visual Phrase Guided Convolutional Neural Network
ViP-CNN: Visual Phrase Guided Convolutional Neural Network
Yikang Li
Wanli Ouyang
Xiaogang Wang
Xiaoóu Tang
ObjD
161
49
0
23 Feb 2017
Person Search with Natural Language Description
Person Search with Natural Language DescriptionComputer Vision and Pattern Recognition (CVPR), 2017
Shuang Li
Tong Xiao
Jiaming Song
Bolei Zhou
Dayu Yue
Xiaogang Wang
207
492
0
19 Feb 2017
Learning to Detect Human-Object Interactions
Learning to Detect Human-Object InteractionsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2017
Yu-Wei Chao
Yunfan Liu
Michael Xieyang Liu
Huayi Zeng
Gaowen Liu
218
575
0
17 Feb 2017
Gated Multimodal Units for Information Fusion
Gated Multimodal Units for Information FusionInternational Conference on Learning Representations (ICLR), 2017
John Arevalo
Thamar Solorio
Manuel Montes-y-Gómez
Fabio Gonzalez
577
463
0
07 Feb 2017
Concurrent Activity Recognition with Multimodal CNN-LSTM Structure
Concurrent Activity Recognition with Multimodal CNN-LSTM Structure
Xinyu Li
Yanyi Zhang
Jianyu Zhang
Shuhong Chen
I. Marsic
Richard A. Farneth
R. Burd
HAI
108
42
0
06 Feb 2017
Learning Word-Like Units from Joint Audio-Visual Analysis
Learning Word-Like Units from Joint Audio-Visual AnalysisAnnual Meeting of the Association for Computational Linguistics (ACL), 2017
David Harwath
James R. Glass
219
107
0
25 Jan 2017
Incremental Learning for Robot Perception through HRI
Incremental Learning for Robot Perception through HRIIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2017
Sepehr Valipour
C. P. Quintero
Martin Jägersand
SSLCLL
105
34
0
17 Jan 2017
Comprehension-guided referring expressions
Comprehension-guided referring expressionsComputer Vision and Pattern Recognition (CVPR), 2017
Ruotian Luo
Gregory Shakhnarovich
ObjD
179
180
0
12 Jan 2017
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
A Joint Speaker-Listener-Reinforcer Model for Referring ExpressionsComputer Vision and Pattern Recognition (CVPR), 2016
Licheng Yu
Hao Tan
Joey Tianyi Zhou
Tamara L. Berg
ObjD
184
288
0
30 Dec 2016
Top-down Visual Saliency Guided by Captions
Top-down Visual Saliency Guided by CaptionsComputer Vision and Pattern Recognition (CVPR), 2016
Vasili Ramanishka
Abir Das
Jianming Zhang
Kate Saenko
163
147
0
21 Dec 2016
An Empirical Study of Language CNN for Image Captioning
An Empirical Study of Language CNN for Image CaptioningIEEE International Conference on Computer Vision (ICCV), 2016
Jiuxiang Gu
G. Wang
Jianfei Cai
Tsuhan Chen
229
147
0
21 Dec 2016
Automatic Generation of Grounded Visual Questions
Automatic Generation of Grounded Visual QuestionsInternational Joint Conference on Artificial Intelligence (IJCAI), 2016
Shijie Zhang
Zhuang Li
Shaodi You
Zhenglu Yang
Jiawan Zhang
OOD
195
84
0
20 Dec 2016
Sparse Factorization Layers for Neural Networks with Limited Supervision
Sparse Factorization Layers for Neural Networks with Limited Supervision
Parker A. Koch
Jason J. Corso
71
2
0
14 Dec 2016
ImageNet pre-trained models with batch normalization
ImageNet pre-trained models with batch normalization
Marcel Simon
E. Rodner
Joachim Denzler
VLMSSeg
187
170
0
05 Dec 2016
Multi-Label Image Classification with Regional Latent Semantic
  Dependencies
Multi-Label Image Classification with Regional Latent Semantic Dependencies
Junjie Zhang
Qi Wu
Chunhua Shen
Jian Zhang
Jianfeng Lu
193
176
0
04 Dec 2016
Areas of Attention for Image Captioning
Areas of Attention for Image Captioning
M. Pedersoli
Thomas Lucas
Cordelia Schmid
Jakob Verbeek
244
215
0
03 Dec 2016
Training Bit Fully Convolutional Network for Fast Semantic Segmentation
Training Bit Fully Convolutional Network for Fast Semantic Segmentation
He Wen
Shuchang Zhou
Zhe Liang
Yuxiang Zhang
Dieqiao Feng
Xinyu Zhou
Cong Yao
MQSSeg
143
10
0
01 Dec 2016
Modeling Relationships in Referential Expressions with Compositional
  Modular Networks
Modeling Relationships in Referential Expressions with Compositional Modular Networks
Ronghang Hu
Marcus Rohrbach
Jacob Andreas
Trevor Darrell
Kate Saenko
181
420
0
30 Nov 2016
Social Scene Understanding: End-to-End Multi-Person Action Localization
  and Collective Activity Recognition
Social Scene Understanding: End-to-End Multi-Person Action Localization and Collective Activity Recognition
Timur M. Bagautdinov
Alexandre Alahi
François Fleuret
Pascal Fua
Silvio Savarese
160
230
0
28 Nov 2016
DeepSetNet: Predicting Sets with Deep Neural Networks
DeepSetNet: Predicting Sets with Deep Neural Networks
S. Hamid Rezatofighi
B. V. Kumar
Anton Milan
Ehsan Abbasnejad
A. Dick
Ian Reid
BDL
259
53
0
28 Nov 2016
Grad-CAM: Why did you say that?
Grad-CAM: Why did you say that?
Ramprasaath R. Selvaraju
Abhishek Das
Ramakrishna Vedantam
Michael Cogswell
Devi Parikh
Dhruv Batra
FAtt
313
555
0
22 Nov 2016
Sampled Image Tagging and Retrieval Methods on User Generated Content
Sampled Image Tagging and Retrieval Methods on User Generated Content
Karl S. Ni
Kyle Zaragoza
Charles Foster
C. Carrano
Barry Y. Chen
Yonas Tesfaye
A. Gude
139
6
0
21 Nov 2016
Dense Captioning with Joint Inference and Visual Context
Dense Captioning with Joint Inference and Visual Context
L. Yang
K. Tang
Jianchao Yang
Li Li
VLM
210
177
0
21 Nov 2016
Phrase Localization and Visual Relationship Detection with Comprehensive
  Image-Language Cues
Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues
Bryan A. Plummer
Arun Mallya
Christopher M. Cervantes
Anjali Narayan-Chen
Svetlana Lazebnik
315
191
0
21 Nov 2016
A Hierarchical Approach for Generating Descriptive Image Paragraphs
A Hierarchical Approach for Generating Descriptive Image Paragraphs
J. Krause
Justin Johnson
Ranjay Krishna
Li Fei-Fei
VLM
205
398
0
20 Nov 2016
Recurrent Memory Addressing for describing videos
Recurrent Memory Addressing for describing videos
A. Jain
Abhinav Agarwalla
Kumar Krishna Agrawal
Pabitra Mitra
132
10
0
20 Nov 2016
Convolutional Gated Recurrent Networks for Video Segmentation
Convolutional Gated Recurrent Networks for Video Segmentation
Mennatullah Siam
Sepehr Valipour
Martin Jägersand
Nilanjan Ray
VOS
288
104
0
16 Nov 2016
Diversity encouraged learning of unsupervised LSTM ensemble for neural
  activity video prediction
Diversity encouraged learning of unsupervised LSTM ensemble for neural activity video prediction
Yilin Song
J. Viventi
Yao Wang
AI4TS
92
2
0
15 Nov 2016
Zero-resource Machine Translation by Multimodal Encoder-decoder Network
  with Multimedia Pivot
Zero-resource Machine Translation by Multimodal Encoder-decoder Network with Multimedia Pivot
Hideki Nakayama
Noriki Nishida
350
62
0
14 Nov 2016
Memory-augmented Attention Modelling for Videos
Memory-augmented Attention Modelling for Videos
Rasool Fakoor
Abdel-rahman Mohamed
Margaret Mitchell
S. B. Kang
Pushmeet Kohli
260
20
0
07 Nov 2016
Spatio-Temporal Attention Models for Grounded Video Captioning
Spatio-Temporal Attention Models for Grounded Video Captioning
M. Zanfir
Elisabeta Marinoiu
C. Sminchisescu
208
51
0
17 Oct 2016
Previous
123...1089
Next