ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.07571
  4. Cited By
DenseCap: Fully Convolutional Localization Networks for Dense Captioning

DenseCap: Fully Convolutional Localization Networks for Dense Captioning

24 November 2015
Justin Johnson
A. Karpathy
Li Fei-Fei
    VLM
ArXiv (abs)PDFHTML

Papers citing "DenseCap: Fully Convolutional Localization Networks for Dense Captioning"

50 / 468 papers shown
Anomaly Detection in Video Sequence with Appearance-Motion
  Correspondence
Anomaly Detection in Video Sequence with Appearance-Motion CorrespondenceIEEE International Conference on Computer Vision (ICCV), 2019
Trong-Nguyen Nguyen
J. Meunier
219
403
0
17 Aug 2019
U-CAM: Visual Explanation using Uncertainty based Class Activation Maps
U-CAM: Visual Explanation using Uncertainty based Class Activation MapsIEEE International Conference on Computer Vision (ICCV), 2019
Badri N. Patro
Mayank Lunayach
Shivansh Patel
Vinay P. Namboodiri
FAttUQCV
334
78
0
17 Aug 2019
Survey on Deep Neural Networks in Speech and Vision Systems
Survey on Deep Neural Networks in Speech and Vision Systems
M. Alam
Manar D. Samad
Lasitha Vidyaratne
Alexander M. Glandon
Khan M. Iftekharuddin
3DVVLMAI4TS
369
224
0
16 Aug 2019
Image Captioning using Facial Expression and Attention
Image Captioning using Facial Expression and AttentionJournal of Artificial Intelligence Research (JAIR), 2019
Omid Mohamad Nezami
Mark Dras
Stephen Wan
Cécile Paris
CVBM
203
11
0
08 Aug 2019
Addressing Data Bias Problems for Chest X-ray Image Report Generation
Addressing Data Bias Problems for Chest X-ray Image Report GenerationBritish Machine Vision Conference (BMVC), 2019
Philipp Harzig
Yan-Ying Chen
Francine Chen
Rainer Lienhart
MedIm
152
55
0
06 Aug 2019
Logic could be learned from images
Logic could be learned from imagesInternational Journal of Machine Learning and Cybernetics (IJMLC), 2019
Q. Guo
Y. Qian
Xinyan Liang
Yanhong She
Deyu Li
Jiye Liang
NAI
180
4
0
06 Aug 2019
Cascaded Revision Network for Novel Object Captioning
Cascaded Revision Network for Novel Object Captioning
Qianyu Feng
Yu Wu
Hehe Fan
C. Yan
Yezhou Yang
129
38
0
06 Aug 2019
Prediction and Description of Near-Future Activities in Video
Prediction and Description of Near-Future Activities in VideoComputer Vision and Image Understanding (CVIU), 2019
T. Mahmud
Mohammad Billah
Mahmudul Hasan
Amit K. Roy-Chowdhury
379
17
0
02 Aug 2019
Curiosity-driven Reinforcement Learning for Diverse Visual Paragraph
  Generation
Curiosity-driven Reinforcement Learning for Diverse Visual Paragraph GenerationACM Multimedia (ACM MM), 2019
Yadan Luo
Zi Huang
Zheng Zhang
Ziwei Wang
Jingjing Li
Yang Yang
105
40
0
01 Aug 2019
ShapeCaptioner: Generative Caption Network for 3D Shapes by Learning a
  Mapping from Parts Detected in Multiple Views to Sentences
ShapeCaptioner: Generative Caption Network for 3D Shapes by Learning a Mapping from Parts Detected in Multiple Views to SentencesACM Multimedia (ACM MM), 2019
Zhizhong Han
Chao Chen
Yu-Shen Liu
Matthias Zwicker
3DPC
185
50
0
31 Jul 2019
Real-time Visual Object Tracking with Natural Language Description
Real-time Visual Object Tracking with Natural Language DescriptionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2019
Qi Feng
Vitaly Ablavsky
Qinxun Bai
Guorong Li
Stan Sclaroff
VLMObjDVOT
272
67
0
26 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and MethodsJournal of Artificial Intelligence Research (JAIR), 2019
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
404
142
0
22 Jul 2019
Image Captioning with Integrated Bottom-Up and Multi-level Residual
  Top-Down Attention for Game Scene Understanding
Image Captioning with Integrated Bottom-Up and Multi-level Residual Top-Down Attention for Game Scene Understanding
Jian Zheng
S. Krishnamurthy
Ruxin Chen
Min-Hung Chen
Zhenhao Ge
Xiaohua Li
135
4
0
16 Jun 2019
Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based
  Partition Prediction
Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition PredictionIEEE Transactions on Image Processing (TIP), 2019
Somdyuti Paul
A. Norkin
A. Bovik
127
15
0
15 Jun 2019
Improving Visual Question Answering by Referring to Generated Paragraph
  Captions
Improving Visual Question Answering by Referring to Generated Paragraph CaptionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Hyounghun Kim
Joey Tianyi Zhou
CoGe
107
21
0
14 Jun 2019
Image Captioning: Transforming Objects into Words
Image Captioning: Transforming Objects into WordsNeural Information Processing Systems (NeurIPS), 2019
Simão Herdade
Armin Kappeler
K. Boakye
Joao Soares
ViT
436
545
0
14 Jun 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million
  Narrated Video Clips
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video ClipsIEEE International Conference on Computer Vision (ICCV), 2019
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
510
1,364
0
07 Jun 2019
Context-Aware Visual Policy Network for Fine-Grained Image Captioning
Context-Aware Visual Policy Network for Fine-Grained Image CaptioningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019
Zhengjun Zha
Daqing Liu
Hanwang Zhang
Yongdong Zhang
Feng Wu
166
133
0
06 Jun 2019
Contextual Translation Embedding for Visual Relationship Detection and
  Scene Graph Generation
Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation
Zih-Siou Hung
Arun Mallya
Svetlana Lazebnik
ViT
214
15
0
28 May 2019
Beyond Visual Semantics: Exploring the Role of Scene Text in Image
  Understanding
Beyond Visual Semantics: Exploring the Role of Scene Text in Image UnderstandingPattern Recognition Letters (PR), 2019
Arka Ujjal Dey
Suman K. Ghosh
Ernest Valveny
Gaurav Harit
190
26
0
25 May 2019
AttentionRNN: A Structured Spatial Attention Mechanism
AttentionRNN: A Structured Spatial Attention MechanismIEEE International Conference on Computer Vision (ICCV), 2019
Siddhesh Khandelwal
Leonid Sigal
187
3
0
22 May 2019
Joint Object and State Recognition using Language Knowledge
Joint Object and State Recognition using Language KnowledgeInternational Conference on Information Photonics (ICIP), 2019
Ahmad Babaeian Jelodar
Yu Sun
170
18
0
13 May 2019
Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting
  for Marketing
Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for MarketingConference on Multimedia Information Processing and Retrieval (MIPR), 2019
Philipp Harzig
D. Zecha
Rainer Lienhart
Carolin Kaiser
René Schallner
74
3
0
06 May 2019
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and
  Sentences From Natural Supervision
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
Jiayuan Mao
Chuang Gan
Pushmeet Kohli
J. Tenenbaum
Jiajun Wu
NAI
492
780
0
26 Apr 2019
Challenges and Prospects in Vision and Language Research
Challenges and Prospects in Vision and Language Research
Kushal Kafle
Robik Shrestha
Christopher Kanan
191
42
0
19 Apr 2019
A Simple Baseline for Audio-Visual Scene-Aware Dialog
A Simple Baseline for Audio-Visual Scene-Aware Dialog
Idan Schwartz
Alex Schwing
Tamir Hazan
200
79
0
11 Apr 2019
Reasoning Visual Dialogs with Structural and Partial Observations
Reasoning Visual Dialogs with Structural and Partial Observations
Zilong Zheng
Wenguan Wang
Siyuan Qi
Song-Chun Zhu
237
119
0
11 Apr 2019
Modularized Textual Grounding for Counterfactual Resilience
Modularized Textual Grounding for Counterfactual Resilience
Zhiyuan Fang
Shu Kong
Charless C. Fowlkes
Yezhou Yang
193
33
0
07 Apr 2019
VQD: Visual Query Detection in Natural Scenes
VQD: Visual Query Detection in Natural Scenes
Manoj Acharya
Karan Jariwala
Christopher Kanan
ObjD
184
18
0
04 Apr 2019
Context and Attribute Grounded Dense Captioning
Context and Attribute Grounded Dense Captioning
Guojun Yin
Lu Sheng
Bin Liu
Nenghai Yu
Xiaogang Wang
Jing Shao
135
83
0
02 Apr 2019
Recurrent Back-Projection Network for Video Super-Resolution
Recurrent Back-Projection Network for Video Super-Resolution
Muhammad Haris
Gregory Shakhnarovich
Norimichi Ukita
SupR
164
476
0
25 Mar 2019
Neural Sequential Phrase Grounding (SeqGROUND)
Neural Sequential Phrase Grounding (SeqGROUND)Computer Vision and Pattern Recognition (CVPR), 2019
Pelin Dogan
Leonid Sigal
Markus Gross
ObjD
215
54
0
18 Mar 2019
Dense Relational Captioning: Triple-Stream Networks for
  Relationship-Based Captioning
Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning
Dong-Jin Kim
Jinsoo Choi
Tae-Hyun Oh
In So Kweon
325
92
0
14 Mar 2019
Learning To Follow Directions in Street View
Learning To Follow Directions in Street ViewAAAI Conference on Artificial Intelligence (AAAI), 2019
Karl Moritz Hermann
Mateusz Malinowski
Piotr Wojciech Mirowski
Andras Banki-Horvath
Keith Anderson
R. Hadsell
SSL
289
73
0
01 Mar 2019
CHIP: Channel-wise Disentangled Interpretation of Deep Convolutional
  Neural Networks
CHIP: Channel-wise Disentangled Interpretation of Deep Convolutional Neural Networks
Xinrui Cui
Dan Wang
F. I. Z. Jane Wang
FAttBDL
150
13
0
07 Feb 2019
Linearized Multi-Sampling for Differentiable Image Transformation
Linearized Multi-Sampling for Differentiable Image Transformation
Wei Jiang
Weiwei Sun
Andrea Tagliasacchi
Eduard Trulls
K. M. Yi
217
24
0
22 Jan 2019
LayoutGAN: Generating Graphic Layouts with Wireframe Discriminators
LayoutGAN: Generating Graphic Layouts with Wireframe Discriminators
Jianan Li
Jimei Yang
Aaron Hertzmann
Jianming Zhang
Tingfa Xu
GAN
315
261
0
21 Jan 2019
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
Ning Xie
Farley Lai
Derek Doran
Asim Kadav
CoGe
356
348
0
20 Jan 2019
Toward Explainable Fashion Recommendation
Toward Explainable Fashion Recommendation
Pongsate Tangseng
Takayuki Okatani
161
33
0
15 Jan 2019
Epipolar Geometry based Learning of Multi-view Depth and Ego-Motion from
  Monocular Sequences
Epipolar Geometry based Learning of Multi-view Depth and Ego-Motion from Monocular Sequences
V. Prasad
Dipanjan Das
Brojeshwar Bhowmick
MDE
210
9
0
23 Dec 2018
SfMLearner++: Learning Monocular Depth & Ego-Motion using Meaningful
  Geometric Constraints
SfMLearner++: Learning Monocular Depth & Ego-Motion using Meaningful Geometric Constraints
V. Prasad
Brojeshwar Bhowmick
MDE
175
26
0
20 Dec 2018
Detecting unseen visual relations using analogies
Detecting unseen visual relations using analogies
Julia Peyre
Ivan Laptev
Cordelia Schmid
Josef Sivic
138
18
0
13 Dec 2018
Visual Social Relationship Recognition
Visual Social Relationship Recognition
Junnan Li
Yongkang Wong
Qi Zhao
Mohan Kankanhalli
127
28
0
13 Dec 2018
Coarse-to-fine: A RNN-based hierarchical attention model for vehicle
  re-identification
Coarse-to-fine: A RNN-based hierarchical attention model for vehicle re-identification
Xiu-Shen Wei
Chen-Da Liu-Zhang
Lingqiao Liu
Chunhua Shen
Jianxin Wu
185
44
0
11 Dec 2018
Neural Word Search in Historical Manuscript Collections
Neural Word Search in Historical Manuscript Collections
T. Wilkinson
Jonas Lindström
Anders Brun
3DV
123
9
0
06 Dec 2018
Interactive Full Image Segmentation by Considering All Regions Jointly
Interactive Full Image Segmentation by Considering All Regions Jointly
E. Agustsson
J. Uijlings
V. Ferrari
VLM
258
77
0
05 Dec 2018
Visual Question Answering as Reading Comprehension
Visual Question Answering as Reading Comprehension
Hui Li
Peng Wang
Chunhua Shen
Anton Van Den Hengel
134
46
0
29 Nov 2018
Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding
Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding
Hassan Akbari
Svebor Karaman
Surabhi Bhargava
Brian Chen
Carl Vondrick
Shih-Fu Chang
149
86
0
28 Nov 2018
MIST: Multiple Instance Spatial Transformer Network
MIST: Multiple Instance Spatial Transformer Network
Baptiste Angles
Shahram Izadi
Simon Kornblith
Andrea Tagliasacchi
K. M. Yi
335
5
0
26 Nov 2018
Show, Control and Tell: A Framework for Generating Controllable and
  Grounded Captions
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
DiffM
271
194
0
26 Nov 2018
Previous
123...1056789
Next