Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2212.03099
Cited By
Semantic-Conditional Diffusion Networks for Image Captioning
Computer Vision and Pattern Recognition (CVPR), 2022
6 December 2022
Jianjie Luo
Yehao Li
Yingwei Pan
Ting Yao
Jianlin Feng
Hongyang Chao
Tao Mei
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (970★)
Papers citing
"Semantic-Conditional Diffusion Networks for Image Captioning"
29 / 29 papers shown
Title
Robust Learning of Diffusion Models with Extremely Noisy Conditions
Xin Chen
Gillian Dobbie
Xinyu Wang
Yifan Zhang
D. Wang
Jingfeng Zhang
DiffM
80
0
0
11 Oct 2025
DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation
Wei Pan
Huiguo He
Hiuyi Cheng
Yilin Shi
Lianwen Jin
DiffM
94
0
0
28 Sep 2025
Diff-3DCap: Shape Captioning with Diffusion Models
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2025
Zhenyu Shu
Jiawei Wen
Shiyang Li
Shiqing Xin
Ligang Liu
DiffM
87
0
0
28 Sep 2025
MM-SeR: Multimodal Self-Refinement for Lightweight Image Captioning
Junha Song
Yongsik Jo
So Yeon Min
Quanting Xie
Taehwan Kim
Yonatan Bisk
Jaegul Choo
VLM
132
0
0
29 Aug 2025
MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models
Xiaolong Wang
Zhaolu Kang
Wangyuxuan Zhai
Xinyue Lou
Yunghwei Lai
...
Yawen Wang
Kaiyu Huang
Yile Wang
Peng Li
Wenshu Fan
146
0
0
20 Jun 2025
Generalized Visual Relation Detection with Diffusion Models
Kaifeng Gao
Siqi Chen
Hanwang Zhang
Jun Xiao
Yueting Zhuang
Qianru Sun
249
0
0
16 Apr 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
326
8
0
13 Mar 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
European Conference on Computer Vision (ECCV), 2024
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
225
1
0
03 Jan 2025
DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding
Hao Wu
Zhihang Zhong
Xiao Sun
DiffM
198
1
0
02 Dec 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
244
0
0
09 Aug 2024
MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection
Youngmin Oh
Hyung-Il Kim
Seong Tae Kim
Jung Uk Kim
DiffM
174
5
0
23 Jul 2024
A Comprehensive Survey on Diffusion Models and Their Applications
M. Ahsan
S. Raman
Yingtao Liu
Zahed Siddique
MedIm
DiffM
322
3
0
01 Jul 2024
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Weihao Ye
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
194
8
0
01 Jun 2024
Diffusion-RSCC: Diffusion Probabilistic Model for Change Captioning in Remote Sensing Images
Xiaofei Yu
Yitong Li
Jie Ma
DiffM
148
0
0
21 May 2024
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
Yuchi Wang
Shuhuai Ren
Rundong Gao
Linli Yao
Qingyan Guo
Kaikai An
Jianhong Bai
Xu Sun
DiffM
VLM
224
14
0
16 Apr 2024
Boosting Diffusion Models with Moving Average Sampling in Frequency Domain
Yurui Qian
Qi Cai
Yingwei Pan
Yehao Li
Ting Yao
Qibin Sun
Tao Mei
DiffM
201
39
0
26 Mar 2024
SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer
Rui Zhu
Yingwei Pan
Yehao Li
Ting Yao
Zhenglong Sun
Tao Mei
C. Chen
168
33
0
25 Mar 2024
VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation
Yang Chen
Yingwei Pan
Haibo Yang
Ting Yao
Tao Mei
DiffM
243
27
0
25 Mar 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
184
16
0
12 Mar 2024
Recurrent Aligned Network for Generalized Pedestrian Trajectory Prediction
Yonghao Dong
Le Wang
Sanpin Zhou
Gang Hua
Changyin Sun
302
16
0
09 Mar 2024
CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models
Ziyue Wang
Chi Chen
Zihao Wan
Zhaolu Kang
Qidong Yan
...
Xiaoyue Mi
Peng Li
Ning Ma
Maosong Sun
Yang Liu
208
11
0
21 Feb 2024
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning
AAAI Conference on Artificial Intelligence (AAAI), 2023
Zhiyue Liu
Jinyuan Liu
Fanrong Ma
CLIP
VLM
180
19
0
14 Dec 2023
DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism
European Conference on Computer Vision (ECCV), 2023
Zhen Wang
Xinyun Jiang
Jun Xiao
Tao Chen
Long Chen
DiffM
172
4
0
25 Nov 2023
A Systematic Review of Deep Learning-based Research on Radiology Report Generation
Chang Liu
Yuanhe Tian
Yan Song
MedIm
276
21
0
23 Nov 2023
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Sijin Chen
Erik Cambria
Mingsheng Li
Xin Chen
Peng Guo
Yinjie Lei
Gang Yu
Taihao Li
Tao Chen
252
39
0
06 Sep 2023
Language-enhanced RNR-Map: Querying Renderable Neural Radiance Field maps with natural language
Francesco Taioli
Federico Cunico
Federico Girella
Riccardo Bologna
Alessandro Farinelli
Marco Cristani
169
8
0
17 Aug 2023
Any-to-Any Generation via Composable Diffusion
Neural Information Processing Systems (NeurIPS), 2023
Zineng Tang
Ziyi Yang
Chenguang Zhu
Michael Zeng
Joey Tianyi Zhou
VGen
DiffM
247
234
0
19 May 2023
Diffusion Models for Non-autoregressive Text Generation: A Survey
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Yifan Li
Kun Zhou
Wayne Xin Zhao
Ji-Rong Wen
MedIm
DiffM
266
49
0
12 Mar 2023
OSIC: A New One-Stage Image Captioner Coined
International Joint Conference on Artificial Intelligence (IJCAI), 2022
Bo Wang
Zhao Zhang
Ming Zhao
Xiaojie Jin
Mingliang Xu
Meng Wang
VLM
160
6
0
04 Nov 2022
1