ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.07571
  4. Cited By
DenseCap: Fully Convolutional Localization Networks for Dense Captioning

DenseCap: Fully Convolutional Localization Networks for Dense Captioning

24 November 2015
Justin Johnson
A. Karpathy
Li Fei-Fei
    VLM
ArXiv (abs)PDFHTML

Papers citing "DenseCap: Fully Convolutional Localization Networks for Dense Captioning"

50 / 467 papers shown
Title
Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual
  Question Answering
Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual Question Answering
Chengxiang Yin
Zhengping Che
Kun Wu
Zhiyuan Xu
Jian Tang
148
1
0
20 Dec 2023
Pixel Aligned Language Models
Pixel Aligned Language ModelsComputer Vision and Pattern Recognition (CVPR), 2023
Jiarui Xu
Xingyi Zhou
Shen Yan
Xiuye Gu
Anurag Arnab
Chen Sun
Xiaolong Wang
Cordelia Schmid
MLLMVLM
219
17
0
14 Dec 2023
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Localized Symbolic Knowledge Distillation for Visual Commonsense ModelsNeural Information Processing Systems (NeurIPS), 2023
Jinho Park
Jack Hessel
Khyathi Chandu
Paul Pu Liang
Ximing Lu
...
Youngjae Yu
Qiuyuan Huang
Jianfeng Gao
Ali Farhadi
Yejin Choi
VLM
200
13
0
08 Dec 2023
Towards More Unified In-context Visual Understanding
Towards More Unified In-context Visual UnderstandingComputer Vision and Pattern Recognition (CVPR), 2023
Dianmo Sheng
DongDong Chen
Zhentao Tan
Qiankun Liu
Qi Chu
Jianmin Bao
Tao Gong
Bin Liu
Shengwei Xu
Nenghai Yu
MLLMVLM
168
13
0
05 Dec 2023
Object Recognition as Next Token Prediction
Object Recognition as Next Token PredictionComputer Vision and Pattern Recognition (CVPR), 2023
Kaiyu Yue
Borchun Chen
Jonas Geiping
Hengduo Li
Tom Goldstein
Ser-Nam Lim
410
12
0
04 Dec 2023
Segment and Caption Anything
Segment and Caption AnythingComputer Vision and Pattern Recognition (CVPR), 2023
Xiaoke Huang
Jianfeng Wang
Yansong Tang
Zheng Zhang
Han Hu
Jiwen Lu
Lijuan Wang
Zicheng Liu
MLLMVLM
198
31
0
01 Dec 2023
Contrastive Vision-Language Alignment Makes Efficient Instruction
  Learner
Contrastive Vision-Language Alignment Makes Efficient Instruction Learner
Lizhao Liu
Xinyu Sun
Tianhang Xiang
Zhuangwei Zhuang
Liuren Yin
Mingkui Tan
VLM
143
4
0
29 Nov 2023
GOAT: GO to Any Thing
GOAT: GO to Any Thing
Matthew Chang
Théophile Gervet
Mukul Khanna
Sriram Yenamandra
Dhruv Shah
...
Saurabh Gupta
Dhruv Batra
Roozbeh Mottaghi
Jitendra Malik
Devendra Singh Chaplot
306
109
0
10 Nov 2023
FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models
FAITHSCORE: Evaluating Hallucinations in Large Vision-Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Liqiang Jing
Ruosen Li
Yunmo Chen
Mengzhao Jia
Xinya Du
MLLM
273
18
0
02 Nov 2023
Generating Context-Aware Natural Answers for Questions in 3D Scenes
Generating Context-Aware Natural Answers for Questions in 3D ScenesBritish Machine Vision Conference (BMVC), 2023
Mohammed Munzer Dwedari
Matthias Niessner
Dave Zhenyu Chen
138
4
0
30 Oct 2023
Open Visual Knowledge Extraction via Relation-Oriented Multimodality
  Model Prompting
Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model PromptingNeural Information Processing Systems (NeurIPS), 2023
Hejie Cui
Xinyu Fang
Zihan Zhang
Ran Xu
Xuan Kan
Xin Liu
Yue Yu
Manling Li
Yangqiu Song
Carl Yang
VLM
133
6
0
28 Oct 2023
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot
  Interactions
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions
Hanbo Zhang
Jie Xu
Yuchen Mo
Tao Kong
137
1
0
18 Oct 2023
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal
  Retrieval
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal RetrievalNeural Information Processing Systems (NeurIPS), 2023
Hao Li
Marie-Jeanne Lesot
Lianli Gao
Xiaosu Zhu
Christophe Marsala
EDL
217
28
0
29 Sep 2023
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and
  Reasoning
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and ReasoningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Enna Sachdeva
Nakul Agarwal
Suhas Chundi
Sean Roelofs
Jiachen Li
Mykel Kochenderfer
Chiho Choi
Behzad Dariush
194
72
0
12 Sep 2023
Towards Real Time Egocentric Segment Captioning for The Blind and
  Visually Impaired in RGB-D Theatre Images
Towards Real Time Egocentric Segment Captioning for The Blind and Visually Impaired in RGB-D Theatre Images
Khadidja Delloul
S. Larabi
209
2
0
26 Aug 2023
Dense Text-to-Image Generation with Attention Modulation
Dense Text-to-Image Generation with Attention ModulationIEEE International Conference on Computer Vision (ICCV), 2023
Yunji Kim
Jiyoung Lee
Jin-Hwa Kim
Jung-Woo Ha
Jun-Yan Zhu
DiffM
233
179
0
24 Aug 2023
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelIEEE International Conference on Computer Vision (ICCV), 2023
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
VLM
168
34
0
15 Aug 2023
TS-RGBD Dataset: a Novel Dataset for Theatre Scenes Description for
  People with Visual Impairments
TS-RGBD Dataset: a Novel Dataset for Theatre Scenes Description for People with Visual Impairments
Leyla Benhamida
Khadidja Delloul
S. Larabi
126
1
0
02 Aug 2023
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention
  and Text Attributes
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes
Guoyun Tu
Ying Liu
Vladimir Vlassov
206
1
0
14 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
MLLMVLM
717
307
0
07 Jul 2023
Improving Reference-based Distinctive Image Captioning with Contrastive
  Rewards
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards
Yangjun Mao
Jun Xiao
Dong Zhang
Meng Cao
Jian Shao
Yueting Zhuang
Long Chen
EGVM
140
9
0
25 Jun 2023
Dense Video Object Captioning from Disjoint Supervision
Dense Video Object Captioning from Disjoint SupervisionInternational Conference on Learning Representations (ICLR), 2023
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
250
7
0
20 Jun 2023
FuseCap: Leveraging Large Language Models for Enriched Fused Image
  Captions
FuseCap: Leveraging Large Language Models for Enriched Fused Image CaptionsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Noam Rotstein
David Bensaid
Shaked Brody
Roy Ganz
Ron Kimmel
VLM
304
50
0
28 May 2023
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental
  Algorithm for Referring Expression Generation from Examples
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from ExamplesConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
P. Sadler
David Schlangen
128
3
0
24 May 2023
i-Code V2: An Autoregressive Generation Framework over Vision, Language,
  and Speech Data
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Ziyi Yang
Mahmoud Khademi
Yichong Xu
Reid Pryzant
Yuwei Fang
...
Yu Shi
Lu Yuan
Takuya Yoshioka
Michael Zeng
Xuedong Huang
142
4
0
21 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner
  and Dense Captioner
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense CaptionerACM Multimedia (ACM MM), 2023
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
Qingbin Liu
164
3
0
19 May 2023
Caption Anything: Interactive Image Description with Diverse Multimodal
  Controls
Caption Anything: Interactive Image Description with Diverse Multimodal Controls
Teng Wang
Jinrui Zhang
Junjie Fei
Hao Zheng
Yunlong Tang
Zhe Li
Mingqi Gao
Shanshan Zhao
MLLM
370
122
0
04 May 2023
Visual Transformation Telling
Visual Transformation Telling
Wanqing Cui
Mustafa Nasir-Moin
Yanyan Lan
Viola J. Chen
Jiafeng Guo
Xueqi Cheng
LRM
210
4
0
03 May 2023
Interactive and Explainable Region-guided Radiology Report Generation
Interactive and Explainable Region-guided Radiology Report GenerationComputer Vision and Pattern Recognition (CVPR), 2023
Tim Tanida
Philip Muller
Georgios Kaissis
Daniel Rueckert
MedIm
197
169
0
17 Apr 2023
Expressive Text-to-Image Generation with Rich Text
Expressive Text-to-Image Generation with Rich TextIEEE International Conference on Computer Vision (ICCV), 2023
Songwei Ge
Taesung Park
Jun-Yan Zhu
Jia-Bin Huang
DiffM
397
97
0
13 Apr 2023
A-CAP: Anticipation Captioning with Commonsense Knowledge
A-CAP: Anticipation Captioning with Commonsense KnowledgeComputer Vision and Pattern Recognition (CVPR), 2023
D. Vo
Quoc-An Luong
Akihiro Sugimoto
Hideki Nakayama
129
2
0
13 Apr 2023
SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts
  Commentaries
SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries
Hassan Mkhallati
A. Cioppa
Silvio Giancola
Guohao Li
Marc Van Droogenbroeck
150
55
0
10 Apr 2023
Scalable and Accurate Self-supervised Multimodal Representation Learning
  without Aligned Video and Text Data
Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data
Vladislav Lialin
Stephen Rawls
David M. Chan
Shalini Ghosh
Anna Rumshisky
Wael Hamza
VLMAI4TS
238
8
0
04 Apr 2023
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology
  Report Generation
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report GenerationIEEE International Conference on Computer Vision (ICCV), 2023
Yaowei Li
Bang-ju Yang
Xuxin Cheng
Zhihong Zhu
Hongxiang Li
Yuexian Zou
316
41
0
28 Mar 2023
Implicit and Explicit Commonsense for Multi-sentence Video Captioning
Implicit and Explicit Commonsense for Multi-sentence Video CaptioningComputer Vision and Image Understanding (CVIU), 2023
Shih-Han Chou
James J. Little
Leonid Sigal
138
3
0
14 Mar 2023
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
CapDet: Unifying Dense Captioning and Open-World Detection PretrainingComputer Vision and Pattern Recognition (CVPR), 2023
Yanxin Long
Youpeng Wen
Jianhua Han
Hang Xu
Pengzhen Ren
Wei Zhang
Sheng Zhao
Xiaodan Liang
ObjDVLM
161
44
0
04 Mar 2023
Stacked Cross-modal Feature Consolidation Attention Networks for Image
  Captioning
Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning
Mozhgan Pourkeshavarz
Shahabedin Nabavi
Mohsen Moghaddam
M. Shamsfard
146
4
0
08 Feb 2023
IC3: Image Captioning by Committee Consensus
IC3: Image Captioning by Committee ConsensusConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
248
23
0
02 Feb 2023
Semi-Supervised Image Captioning by Adversarially Propagating Labeled
  Data
Semi-Supervised Image Captioning by Adversarially Propagating Labeled DataIEEE Access (IEEE Access), 2023
Dong-Jin Kim
Tae-Hyun Oh
Jinsoo Choi
In So Kweon
SSLVLM
123
9
0
26 Jan 2023
Focus! Relevant and Sufficient Context Selection for News Image
  Captioning
Focus! Relevant and Sufficient Context Selection for News Image CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Mingyang Zhou
Grace Luo
Anna Rohrbach
Zhou Yu
CLIP
145
16
0
01 Dec 2022
GRiT: A Generative Region-to-text Transformer for Object Understanding
GRiT: A Generative Region-to-text Transformer for Object UnderstandingEuropean Conference on Computer Vision (ECCV), 2022
Jialian Wu
Jianfeng Wang
Zhengyuan Yang
Zhe Gan
Zicheng Liu
Junsong Yuan
Lijuan Wang
ObjDVLM
199
145
0
01 Dec 2022
Make-A-Story: Visual Memory Conditioned Consistent Story Generation
Make-A-Story: Visual Memory Conditioned Consistent Story GenerationComputer Vision and Pattern Recognition (CVPR), 2022
Tanzila Rahman
Hsin-Ying Lee
Jian Ren
Sergey Tulyakov
Shweta Mahajan
Leonid Sigal
DiffM
293
90
0
23 Nov 2022
Towards Unifying Reference Expression Generation and Comprehension
Towards Unifying Reference Expression Generation and ComprehensionConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Duo Zheng
Tao Kong
Ya Jing
Jiaan Wang
Xiaojie Wang
ObjD
126
9
0
24 Oct 2022
Contextual Modeling for 3D Dense Captioning on Point Clouds
Contextual Modeling for 3D Dense Captioning on Point Clouds
Yufeng Zhong
Longdao Xu
Jiebo Luo
Lin Ma
154
17
0
08 Oct 2022
DRAMA: Joint Risk Localization and Captioning in Driving
DRAMA: Joint Risk Localization and Captioning in DrivingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Srikanth Malla
Chiho Choi
Isht Dwivedi
Joonhyang Choi
Jiachen Li
266
144
0
22 Sep 2022
Rethinking the Reference-based Distinctive Image Captioning
Rethinking the Reference-based Distinctive Image CaptioningACM Multimedia (ACM MM), 2022
Yangjun Mao
Long Chen
Zhihong Jiang
Dong Zhang
Zhimeng Zhang
Jian Shao
Jun Xiao
DiffM
181
23
0
22 Jul 2022
Is an Object-Centric Video Representation Beneficial for Transfer?
Is an Object-Centric Video Representation Beneficial for Transfer?Asian Conference on Computer Vision (ACCV), 2022
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
ViT
294
30
0
20 Jul 2022
ZoDIAC: Zoneout Dropout Injection Attention Calculation
ZoDIAC: Zoneout Dropout Injection Attention Calculation
Zanyar Zohourianshahzadi
Terrance Boult
Jugal Kalita
197
0
0
28 Jun 2022
From Shallow to Deep: Compositional Reasoning over Graphs for Visual
  Question Answering
From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question AnsweringIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Zihao Zhu
NAIReLMGNN
177
4
0
25 Jun 2022
Bypass Network for Semantics Driven Image Paragraph Captioning
Bypass Network for Semantics Driven Image Paragraph CaptioningComputer Vision and Image Understanding (CVIU), 2022
Qinjie Zheng
Chaoyue Wang
Dadong Wang
186
1
0
21 Jun 2022
Previous
12345...8910
Next