Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1511.07571
Cited By
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
24 November 2015
Justin Johnson
A. Karpathy
Li Fei-Fei
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DenseCap: Fully Convolutional Localization Networks for Dense Captioning"
50 / 467 papers shown
Title
Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual Question Answering
Chengxiang Yin
Zhengping Che
Kun Wu
Zhiyuan Xu
Jian Tang
148
1
0
20 Dec 2023
Pixel Aligned Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Jiarui Xu
Xingyi Zhou
Shen Yan
Xiuye Gu
Anurag Arnab
Chen Sun
Xiaolong Wang
Cordelia Schmid
MLLM
VLM
219
17
0
14 Dec 2023
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Neural Information Processing Systems (NeurIPS), 2023
Jinho Park
Jack Hessel
Khyathi Chandu
Paul Pu Liang
Ximing Lu
...
Youngjae Yu
Qiuyuan Huang
Jianfeng Gao
Ali Farhadi
Yejin Choi
VLM
200
13
0
08 Dec 2023
Towards More Unified In-context Visual Understanding
Computer Vision and Pattern Recognition (CVPR), 2023
Dianmo Sheng
DongDong Chen
Zhentao Tan
Qiankun Liu
Qi Chu
Jianmin Bao
Tao Gong
Bin Liu
Shengwei Xu
Nenghai Yu
MLLM
VLM
168
13
0
05 Dec 2023
Object Recognition as Next Token Prediction
Computer Vision and Pattern Recognition (CVPR), 2023
Kaiyu Yue
Borchun Chen
Jonas Geiping
Hengduo Li
Tom Goldstein
Ser-Nam Lim
410
12
0
04 Dec 2023
Segment and Caption Anything
Computer Vision and Pattern Recognition (CVPR), 2023
Xiaoke Huang
Jianfeng Wang
Yansong Tang
Zheng Zhang
Han Hu
Jiwen Lu
Lijuan Wang
Zicheng Liu
MLLM
VLM
198
31
0
01 Dec 2023
Contrastive Vision-Language Alignment Makes Efficient Instruction Learner
Lizhao Liu
Xinyu Sun
Tianhang Xiang
Zhuangwei Zhuang
Liuren Yin
Mingkui Tan
VLM
143
4
0
29 Nov 2023
GOAT: GO to Any Thing
Matthew Chang
Théophile Gervet
Mukul Khanna
Sriram Yenamandra
Dhruv Shah
...
Saurabh Gupta
Dhruv Batra
Roozbeh Mottaghi
Jitendra Malik
Devendra Singh Chaplot
306
109
0
10 Nov 2023
FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Liqiang Jing
Ruosen Li
Yunmo Chen
Mengzhao Jia
Xinya Du
MLLM
273
18
0
02 Nov 2023
Generating Context-Aware Natural Answers for Questions in 3D Scenes
British Machine Vision Conference (BMVC), 2023
Mohammed Munzer Dwedari
Matthias Niessner
Dave Zhenyu Chen
138
4
0
30 Oct 2023
Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting
Neural Information Processing Systems (NeurIPS), 2023
Hejie Cui
Xinyu Fang
Zihan Zhang
Ran Xu
Xuan Kan
Xin Liu
Yue Yu
Manling Li
Yangqiu Song
Carl Yang
VLM
133
6
0
28 Oct 2023
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions
Hanbo Zhang
Jie Xu
Yuchen Mo
Tao Kong
137
1
0
18 Oct 2023
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval
Neural Information Processing Systems (NeurIPS), 2023
Hao Li
Marie-Jeanne Lesot
Lianli Gao
Xiaosu Zhu
Christophe Marsala
EDL
217
28
0
29 Sep 2023
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Enna Sachdeva
Nakul Agarwal
Suhas Chundi
Sean Roelofs
Jiachen Li
Mykel Kochenderfer
Chiho Choi
Behzad Dariush
194
72
0
12 Sep 2023
Towards Real Time Egocentric Segment Captioning for The Blind and Visually Impaired in RGB-D Theatre Images
Khadidja Delloul
S. Larabi
209
2
0
26 Aug 2023
Dense Text-to-Image Generation with Attention Modulation
IEEE International Conference on Computer Vision (ICCV), 2023
Yunji Kim
Jiyoung Lee
Jin-Hwa Kim
Jung-Woo Ha
Jun-Yan Zhu
DiffM
233
179
0
24 Aug 2023
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model
IEEE International Conference on Computer Vision (ICCV), 2023
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
VLM
168
34
0
15 Aug 2023
TS-RGBD Dataset: a Novel Dataset for Theatre Scenes Description for People with Visual Impairments
Leyla Benhamida
Khadidja Delloul
S. Larabi
126
1
0
02 Aug 2023
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes
Guoyun Tu
Ying Liu
Vladimir Vlassov
206
1
0
14 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
MLLM
VLM
717
307
0
07 Jul 2023
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards
Yangjun Mao
Jun Xiao
Dong Zhang
Meng Cao
Jian Shao
Yueting Zhuang
Long Chen
EGVM
140
9
0
25 Jun 2023
Dense Video Object Captioning from Disjoint Supervision
International Conference on Learning Representations (ICLR), 2023
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
250
7
0
20 Jun 2023
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Noam Rotstein
David Bensaid
Shaked Brody
Roy Ganz
Ron Kimmel
VLM
304
50
0
28 May 2023
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from Examples
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
P. Sadler
David Schlangen
128
3
0
24 May 2023
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Ziyi Yang
Mahmoud Khademi
Yichong Xu
Reid Pryzant
Yuwei Fang
...
Yu Shi
Lu Yuan
Takuya Yoshioka
Michael Zeng
Xuedong Huang
142
4
0
21 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
ACM Multimedia (ACM MM), 2023
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
Qingbin Liu
164
3
0
19 May 2023
Caption Anything: Interactive Image Description with Diverse Multimodal Controls
Teng Wang
Jinrui Zhang
Junjie Fei
Hao Zheng
Yunlong Tang
Zhe Li
Mingqi Gao
Shanshan Zhao
MLLM
370
122
0
04 May 2023
Visual Transformation Telling
Wanqing Cui
Mustafa Nasir-Moin
Yanyan Lan
Viola J. Chen
Jiafeng Guo
Xueqi Cheng
LRM
210
4
0
03 May 2023
Interactive and Explainable Region-guided Radiology Report Generation
Computer Vision and Pattern Recognition (CVPR), 2023
Tim Tanida
Philip Muller
Georgios Kaissis
Daniel Rueckert
MedIm
197
169
0
17 Apr 2023
Expressive Text-to-Image Generation with Rich Text
IEEE International Conference on Computer Vision (ICCV), 2023
Songwei Ge
Taesung Park
Jun-Yan Zhu
Jia-Bin Huang
DiffM
397
97
0
13 Apr 2023
A-CAP: Anticipation Captioning with Commonsense Knowledge
Computer Vision and Pattern Recognition (CVPR), 2023
D. Vo
Quoc-An Luong
Akihiro Sugimoto
Hideki Nakayama
129
2
0
13 Apr 2023
SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries
Hassan Mkhallati
A. Cioppa
Silvio Giancola
Guohao Li
Marc Van Droogenbroeck
150
55
0
10 Apr 2023
Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data
Vladislav Lialin
Stephen Rawls
David M. Chan
Shalini Ghosh
Anna Rumshisky
Wael Hamza
VLM
AI4TS
238
8
0
04 Apr 2023
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation
IEEE International Conference on Computer Vision (ICCV), 2023
Yaowei Li
Bang-ju Yang
Xuxin Cheng
Zhihong Zhu
Hongxiang Li
Yuexian Zou
316
41
0
28 Mar 2023
Implicit and Explicit Commonsense for Multi-sentence Video Captioning
Computer Vision and Image Understanding (CVIU), 2023
Shih-Han Chou
James J. Little
Leonid Sigal
138
3
0
14 Mar 2023
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
Computer Vision and Pattern Recognition (CVPR), 2023
Yanxin Long
Youpeng Wen
Jianhua Han
Hang Xu
Pengzhen Ren
Wei Zhang
Sheng Zhao
Xiaodan Liang
ObjD
VLM
161
44
0
04 Mar 2023
Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning
Mozhgan Pourkeshavarz
Shahabedin Nabavi
Mohsen Moghaddam
M. Shamsfard
146
4
0
08 Feb 2023
IC3: Image Captioning by Committee Consensus
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
248
23
0
02 Feb 2023
Semi-Supervised Image Captioning by Adversarially Propagating Labeled Data
IEEE Access (IEEE Access), 2023
Dong-Jin Kim
Tae-Hyun Oh
Jinsoo Choi
In So Kweon
SSL
VLM
123
9
0
26 Jan 2023
Focus! Relevant and Sufficient Context Selection for News Image Captioning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Mingyang Zhou
Grace Luo
Anna Rohrbach
Zhou Yu
CLIP
145
16
0
01 Dec 2022
GRiT: A Generative Region-to-text Transformer for Object Understanding
European Conference on Computer Vision (ECCV), 2022
Jialian Wu
Jianfeng Wang
Zhengyuan Yang
Zhe Gan
Zicheng Liu
Junsong Yuan
Lijuan Wang
ObjD
VLM
199
145
0
01 Dec 2022
Make-A-Story: Visual Memory Conditioned Consistent Story Generation
Computer Vision and Pattern Recognition (CVPR), 2022
Tanzila Rahman
Hsin-Ying Lee
Jian Ren
Sergey Tulyakov
Shweta Mahajan
Leonid Sigal
DiffM
293
90
0
23 Nov 2022
Towards Unifying Reference Expression Generation and Comprehension
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Duo Zheng
Tao Kong
Ya Jing
Jiaan Wang
Xiaojie Wang
ObjD
126
9
0
24 Oct 2022
Contextual Modeling for 3D Dense Captioning on Point Clouds
Yufeng Zhong
Longdao Xu
Jiebo Luo
Lin Ma
154
17
0
08 Oct 2022
DRAMA: Joint Risk Localization and Captioning in Driving
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Srikanth Malla
Chiho Choi
Isht Dwivedi
Joonhyang Choi
Jiachen Li
266
144
0
22 Sep 2022
Rethinking the Reference-based Distinctive Image Captioning
ACM Multimedia (ACM MM), 2022
Yangjun Mao
Long Chen
Zhihong Jiang
Dong Zhang
Zhimeng Zhang
Jian Shao
Jun Xiao
DiffM
181
23
0
22 Jul 2022
Is an Object-Centric Video Representation Beneficial for Transfer?
Asian Conference on Computer Vision (ACCV), 2022
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
ViT
294
30
0
20 Jul 2022
ZoDIAC: Zoneout Dropout Injection Attention Calculation
Zanyar Zohourianshahzadi
Terrance Boult
Jugal Kalita
197
0
0
28 Jun 2022
From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Zihao Zhu
NAI
ReLM
GNN
177
4
0
25 Jun 2022
Bypass Network for Semantics Driven Image Paragraph Captioning
Computer Vision and Image Understanding (CVIU), 2022
Qinjie Zheng
Chaoyue Wang
Dadong Wang
186
1
0
21 Jun 2022
Previous
1
2
3
4
5
...
8
9
10
Next