Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1511.07571
Cited By
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
24 November 2015
Justin Johnson
A. Karpathy
Li Fei-Fei
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DenseCap: Fully Convolutional Localization Networks for Dense Captioning"
50 / 452 papers shown
Title
Dense Text-to-Image Generation with Attention Modulation
Yunji Kim
Jiyoung Lee
Jin-Hwa Kim
Jung-Woo Ha
Jun-Yan Zhu
DiffM
36
134
0
24 Aug 2023
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
VLM
24
19
0
15 Aug 2023
TS-RGBD Dataset: a Novel Dataset for Theatre Scenes Description for People with Visual Impairments
Leyla Benhamida
Khadidja Delloul
S. Larabi
11
1
0
02 Aug 2023
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes
Guoyun Tu
Ying Liu
Vladimir Vlassov
120
1
0
14 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
VLM
MLLM
83
224
0
07 Jul 2023
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards
Yangjun Mao
Jun Xiao
Dong Zhang
Meng Cao
Jian Shao
Yueting Zhuang
Long Chen
EGVM
11
9
0
25 Jun 2023
Dense Video Object Captioning from Disjoint Supervision
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
20
2
0
20 Jun 2023
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
Noam Rotstein
David Bensaid
Shaked Brody
Roy Ganz
Ron Kimmel
VLM
24
26
0
28 May 2023
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from Examples
P. Sadler
David Schlangen
19
2
0
24 May 2023
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Ziyi Yang
Mahmoud Khademi
Yichong Xu
Reid Pryzant
Yuwei Fang
...
Yu Shi
Lu Yuan
Takuya Yoshioka
Michael Zeng
Xuedong Huang
17
2
0
21 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
J. Liu
13
1
0
19 May 2023
Caption Anything: Interactive Image Description with Diverse Multimodal Controls
Teng Wang
Jinrui Zhang
Junjie Fei
Hao Zheng
Yunlong Tang
Zhe Li
Mingqi Gao
Shanshan Zhao
MLLM
102
82
0
04 May 2023
Visual Transformation Telling
Wanqing Cui
Mustafa Nasir-Moin
Yanyan Lan
Viola J. Chen
J. Guo
Xueqi Cheng
LRM
51
1
0
03 May 2023
Interactive and Explainable Region-guided Radiology Report Generation
Tim Tanida
Philip Muller
Georgios Kaissis
Daniel Rueckert
MedIm
24
110
0
17 Apr 2023
Expressive Text-to-Image Generation with Rich Text
Songwei Ge
Taesung Park
Jun-Yan Zhu
Jia-Bin Huang
DiffM
77
79
0
13 Apr 2023
A-CAP: Anticipation Captioning with Commonsense Knowledge
D. Vo
Quoc-An Luong
Akihiro Sugimoto
Hideki Nakayama
19
2
0
13 Apr 2023
SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries
Hassan Mkhallati
A. Cioppa
Silvio Giancola
Bernard Ghanem
Marc Van Droogenbroeck
22
33
0
10 Apr 2023
Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data
Vladislav Lialin
Stephen Rawls
David M. Chan
Shalini Ghosh
Anna Rumshisky
Wael Hamza
VLM
AI4TS
28
6
0
04 Apr 2023
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation
Yaowei Li
Bang-ju Yang
Xuxin Cheng
Zhihong Zhu
Hongxiang Li
Yuexian Zou
19
31
0
28 Mar 2023
Implicit and Explicit Commonsense for Multi-sentence Video Captioning
Shih-Han Chou
James J. Little
Leonid Sigal
21
2
0
14 Mar 2023
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
Yanxin Long
Youpeng Wen
Jianhua Han
Hang Xu
Pengzhen Ren
Wei Zhang
Sheng Zhao
Xiaodan Liang
ObjD
VLM
12
31
0
04 Mar 2023
Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning
Mozhgan Pourkeshavarz
Shahabedin Nabavi
Mohsen Moghaddam
M. Shamsfard
23
4
0
08 Feb 2023
IC3: Image Captioning by Committee Consensus
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
19
17
0
02 Feb 2023
Semi-Supervised Image Captioning by Adversarially Propagating Labeled Data
Dong-Jin Kim
Tae-Hyun Oh
Jinsoo Choi
In So Kweon
SSL
VLM
27
4
0
26 Jan 2023
Focus! Relevant and Sufficient Context Selection for News Image Captioning
Mingyang Zhou
Grace Luo
Anna Rohrbach
Zhou Yu
CLIP
8
13
0
01 Dec 2022
GRiT: A Generative Region-to-text Transformer for Object Understanding
Jialian Wu
Jianfeng Wang
Zhengyuan Yang
Zhe Gan
Zicheng Liu
Junsong Yuan
Lijuan Wang
ObjD
VLM
14
111
0
01 Dec 2022
Make-A-Story: Visual Memory Conditioned Consistent Story Generation
Tanzila Rahman
Hsin-Ying Lee
Jian Ren
Sergey Tulyakov
Shweta Mahajan
Leonid Sigal
DiffM
19
68
0
23 Nov 2022
Towards Unifying Reference Expression Generation and Comprehension
Duo Zheng
Tao Kong
Ya Jing
Jiaan Wang
Xiaojie Wang
ObjD
27
6
0
24 Oct 2022
Contextual Modeling for 3D Dense Captioning on Point Clouds
Yufeng Zhong
Longdao Xu
Jiebo Luo
Lin Ma
44
15
0
08 Oct 2022
DRAMA: Joint Risk Localization and Captioning in Driving
Srikanth Malla
Chiho Choi
Isht Dwivedi
Joonhyang Choi
Jiachen Li
94
87
0
22 Sep 2022
Rethinking the Reference-based Distinctive Image Captioning
Yangjun Mao
Long Chen
Zhihong Jiang
Dong Zhang
Zhimeng Zhang
Jian Shao
Jun Xiao
DiffM
14
22
0
22 Jul 2022
Is an Object-Centric Video Representation Beneficial for Transfer?
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
ViT
19
26
0
20 Jul 2022
ZoDIAC: Zoneout Dropout Injection Attention Calculation
Zanyar Zohourianshahzadi
Jugal Kalita
26
0
0
28 Jun 2022
From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering
Zihao Zhu
NAI
ReLM
GNN
20
3
0
25 Jun 2022
Bypass Network for Semantics Driven Image Paragraph Captioning
Qinjie Zheng
Chaoyue Wang
Dadong Wang
9
1
0
21 Jun 2022
FD-CAM: Improving Faithfulness and Discriminability of Visual Explanation for CNNs
Hui Li
Zihao Li
Rui Ma
Tieru Wu
FAtt
18
8
0
17 Jun 2022
Language Models Can See: Plugging Visual Controls in Text Generation
Yixuan Su
Tian Lan
Yahui Liu
Fangyu Liu
Dani Yogatama
Yan Wang
Lingpeng Kong
Nigel Collier
VLM
MLLM
40
97
0
05 May 2022
Diverse Image Captioning with Grounded Style
Franz Klein
Shweta Mahajan
S. Roth
14
7
0
03 May 2022
CapOnImage: Context-driven Dense-Captioning on Image
Yiqi Gao
Xinglin Hou
Yuanmeng Zhang
T. Ge
Yuning Jiang
Peifeng Wang
25
10
0
27 Apr 2022
"It Feels Like Being Locked in A Cage": Understanding Blind or Low Vision Streamers' Perceptions of Content Curation Algorithms
Ethan Z. Rong
Mo Morgana Zhou
Zhicong Lu
Mingming Fan
9
22
0
24 Apr 2022
Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds
Heng Wang
Chaoyi Zhang
Jianhui Yu
Weidong (Tom) Cai
3DPC
12
38
0
22 Apr 2022
Vision Transformers in Medical Computer Vision -- A Contemplative Retrospection
Arshi Parvaiz
Muhammad Anwaar Khalid
Rukhsana Zafar
Huma Ameer
M. Ali
M. Fraz
MedIm
11
59
0
29 Mar 2022
ViNTER: Image Narrative Generation with Emotion-Arc-Aware Transformer
Kohei Uehara
Yusuke Mori
Yusuke Mukuta
Tatsuya Harada
22
6
0
15 Feb 2022
Describing image focused in cognitive and visual details for visually impaired people: An approach to generating inclusive paragraphs
Daniel Louzada Fernandes
Marcos Henrique Fonseca Ribeiro
F. Cerqueira
Michel Melo Silva
12
6
0
10 Feb 2022
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning
Jack Hessel
Jena D. Hwang
J. Park
Rowan Zellers
Chandra Bhagavatula
Anna Rohrbach
Kate Saenko
Yejin Choi
ReLM
147
48
0
10 Feb 2022
Robotic Grasping from Classical to Modern: A Survey
Hanbo Zhang
Jian Tang
Shiguang Sun
Xuguang Lan
27
39
0
08 Feb 2022
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
8
88
0
31 Jan 2022
Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation
Philipp Harzig
Moritz Einfalt
Rainer Lienhart
ViT
26
2
0
28 Dec 2021
Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds
Ayush Jain
N. Gkanatsios
Ishita Mediratta
Katerina Fragkiadaki
ObjD
23
98
0
16 Dec 2021
MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning
Wenqiao Zhang
Haochen Shi
Jiannan Guo
Shengyu Zhang
Qingpeng Cai
Juncheng Li
Sihui Luo
Yueting Zhuang
DiffM
11
46
0
13 Dec 2021
Previous
1
2
3
4
5
...
8
9
10
Next