ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.08718
  4. Cited By
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
v1v2v3 (latest)

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
18 April 2021
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
    CLIP
ArXiv (abs)PDFHTML

Papers citing "CLIPScore: A Reference-free Evaluation Metric for Image Captioning"

50 / 1,489 papers shown
MOSAIC: Multi-Object Segmented Arbitrary Stylization Using CLIP
MOSAIC: Multi-Object Segmented Arbitrary Stylization Using CLIP
Prajwal Ganugula
Y. Kumar
N. Reddy
Prabhath Chellingi
A. Thakur
Neeraj Kasera
C. S. Anand
CLIPDiffM
167
4
0
24 Sep 2023
ContextRef: Evaluating Referenceless Metrics For Image Description
  Generation
ContextRef: Evaluating Referenceless Metrics For Image Description GenerationInternational Conference on Learning Representations (ICLR), 2023
Elisa Kreiss
E. Zelikman
Christopher Potts
Nick Haber
246
5
0
21 Sep 2023
Language-driven Object Fusion into Neural Radiance Fields with
  Pose-Conditioned Dataset Updates
Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset UpdatesComputer Vision and Pattern Recognition (CVPR), 2023
Kashun Shum
Jaeyeon Kim
Binh-Son Hua
Duc Thanh Nguyen
Sai-Kit Yeung
3DHAI4CE
227
10
0
20 Sep 2023
Guide Your Agent with Adaptive Multimodal Rewards
Guide Your Agent with Adaptive Multimodal RewardsNeural Information Processing Systems (NeurIPS), 2023
Changyeon Kim
Younggyo Seo
Hao Liu
Lisa Lee
Jinwoo Shin
Honglak Lee
Kimin Lee
355
11
0
19 Sep 2023
Forgedit: Text Guided Image Editing via Learning and Forgetting
Forgedit: Text Guided Image Editing via Learning and Forgetting
Shiwen Zhang
Shuai Xiao
Weilin Huang
DiffM
228
29
0
19 Sep 2023
What is the Best Automated Metric for Text to Motion Generation?
What is the Best Automated Metric for Text to Motion Generation?ACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia), 2023
Jordan Voas
Yili Wang
Qixing Huang
Raymond Mooney
EGVM
276
17
0
19 Sep 2023
Market-GAN: Adding Control to Financial Market Data Generation with
  Semantic Context
Market-GAN: Adding Control to Financial Market Data Generation with Semantic ContextAAAI Conference on Artificial Intelligence (AAAI), 2023
Haochong Xia
Shuo Sun
Xinrun Wang
Bo An
AIFin
244
14
0
14 Sep 2023
Language Models as Black-Box Optimizers for Vision-Language Models
Language Models as Black-Box Optimizers for Vision-Language ModelsComputer Vision and Pattern Recognition (CVPR), 2023
Shihong Liu
Zhiqiu Lin
Samuel Yu
Ryan Lee
Tiffany Ling
Deepak Pathak
Deva Ramanan
VLM
411
42
0
12 Sep 2023
Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts
Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic PromptsInternational Conference on Machine Learning (ICML), 2023
Zhi-Yi Chin
Chieh-Ming Jiang
Ching-Chun Huang
Pin-Yu Chen
Wei-Chen Chiu
DiffM
371
123
0
12 Sep 2023
Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image
  Captioning
Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image CaptioningInternational Conference on Language Resources and Evaluation (LREC), 2023
Guisheng Liu
Yi Li
Zhengcong Fei
Haiyan Fu
Xiangyang Luo
Yanqing Guo
VLMDiffM
266
16
0
10 Sep 2023
Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis
Exploring Sparse MoE in GANs for Text-conditioned Image SynthesisComputer Vision and Pattern Recognition (CVPR), 2023
Jiapeng Zhu
Ceyuan Yang
Kecheng Zheng
Yinghao Xu
Zifan Shi
Yujun Shen
MoE
262
14
0
07 Sep 2023
Chasing Consistency in Text-to-3D Generation from a Single Image
Chasing Consistency in Text-to-3D Generation from a Single Image
Yichen Ouyang
Wenhao Chai
Jiayi Ye
Dapeng Tao
Yibing Zhan
Gaoang Wang
DiffM
206
16
0
07 Sep 2023
Generating Realistic Images from In-the-wild Sounds
Generating Realistic Images from In-the-wild SoundsIEEE International Conference on Computer Vision (ICCV), 2023
Taegyeong Lee
Jeonghun Kang
Hyeonyu Kim
Taehwan Kim
DiffM
256
11
0
05 Sep 2023
ControlMat: A Controlled Generative Approach to Material Capture
ControlMat: A Controlled Generative Approach to Material CaptureACM Transactions on Graphics (TOG), 2023
Giuseppe Vecchio
Rosalie Martin
Arthur Roullier
Adrien Kaiser
Romain Rouffet
Valentin Deschaintre
T. Boubekeur
DiffM
256
64
0
04 Sep 2023
Exploring Limits of Diffusion-Synthetic Training with Weakly Supervised
  Semantic Segmentation
Exploring Limits of Diffusion-Synthetic Training with Weakly Supervised Semantic SegmentationAsian Conference on Computer Vision (ACCV), 2023
Ryota Yoshihashi
Yuya Otsuka
Kenji Doi
Tomohiro Tanaka
Hirokatsu Kataoka
469
4
0
04 Sep 2023
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
  Large Model
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large ModelIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Fengxiang Bie
Jianlong Wu
Zhongzhu Zhou
Adam Ghanem
Minjia Zhang
...
Pareesa Ameneh Golnari
David A. Clifton
Yuxiong He
Dacheng Tao
Shuaiwen Leon Song
EGVM
256
58
0
02 Sep 2023
Socratis: Are large multimodal models emotionally aware?
Socratis: Are large multimodal models emotionally aware?
Katherine Deng
Arijit Ray
Reuben Tan
Saadia Gabriel
Bryan A. Plummer
Kate Saenko
344
8
0
31 Aug 2023
Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMsComputer Vision and Pattern Recognition (CVPR), 2023
Hao Fei
Shengqiong Wu
Wei Ji
Hanwang Zhang
Tat-Seng Chua
VGenDiffM
220
45
0
26 Aug 2023
Dense Text-to-Image Generation with Attention Modulation
Dense Text-to-Image Generation with Attention ModulationIEEE International Conference on Computer Vision (ICCV), 2023
Yunji Kim
Jiyoung Lee
Jin-Hwa Kim
Jung-Woo Ha
Jun-Yan Zhu
DiffM
317
182
0
24 Aug 2023
With a Little Help from your own Past: Prototypical Memory Networks for
  Image Captioning
With a Little Help from your own Past: Prototypical Memory Networks for Image CaptioningIEEE International Conference on Computer Vision (ICCV), 2023
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
186
31
0
23 Aug 2023
CgT-GAN: CLIP-guided Text GAN for Image Captioning
CgT-GAN: CLIP-guided Text GAN for Image CaptioningACM Multimedia (ACM MM), 2023
Jiarui Yu
Haoran Li
Y. Hao
B. Zhu
Tong Xu
Xiangnan He
VLMCLIP
229
24
0
23 Aug 2023
MusicJam: Visualizing Music Insights via Generated Narrative
  Illustrations
MusicJam: Visualizing Music Insights via Generated Narrative IllustrationsCommunications in Information and Systems (CIS), 2023
Chuer Chen
Nan Cao
Jiani Hou
Yi Guo
Yulei Zhang
Yang Shi
DiffM
201
1
0
22 Aug 2023
DiffCloth: Diffusion Based Garment Synthesis and Manipulation via
  Structural Cross-modal Semantic Alignment
DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic AlignmentIEEE International Conference on Computer Vision (ICCV), 2023
Xujie Zhang
Binbin Yang
Michael C. Kampffmeyer
Wenqing Zhang
Shiyue Zhang
Guansong Lu
Liang Lin
Hang Xu
Xiaodan Liang
DiffM
396
17
0
22 Aug 2023
Generic Attention-model Explainability by Weighted Relevance
  Accumulation
Generic Attention-model Explainability by Weighted Relevance AccumulationACM Multimedia Asia (MA), 2023
Yiming Huang
Ao Jia
Xiaodan Zhang
Jiawei Zhang
156
4
0
20 Aug 2023
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
AltDiffusion: A Multilingual Text-to-Image Diffusion ModelAAAI Conference on Artificial Intelligence (AAAI), 2023
Fulong Ye
Guangyi Liu
Xinya Wu
Ledell Yu Wu
VLM
309
47
0
19 Aug 2023
DUAW: Data-free Universal Adversarial Watermark against Stable Diffusion
  Customization
DUAW: Data-free Universal Adversarial Watermark against Stable Diffusion Customization
Xiaoyu Ye
Hao Huang
Jiaqi An
Yongtao Wang
WIGM
226
26
0
19 Aug 2023
DiffDis: Empowering Generative Diffusion Model with Cross-Modal
  Discrimination Capability
DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination CapabilityIEEE International Conference on Computer Vision (ICCV), 2023
Runhu Huang
Jianhua Han
Guansong Lu
Xiaodan Liang
Yihan Zeng
Wei Zhang
Hang Xu
DiffM
171
8
0
18 Aug 2023
Learning to Generate Semantic Layouts for Higher Text-Image
  Correspondence in Text-to-Image Synthesis
Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image SynthesisIEEE International Conference on Computer Vision (ICCV), 2023
Minho Park
Jooyeol Yun
Seunghwan Choi
Jaegul Choo
DiffM
183
12
0
16 Aug 2023
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image
  Diffusion Models
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
Hu Ye
Jun Zhang
Siyi Liu
Xiao Han
Wei Yang
DiffM
323
1,282
0
13 Aug 2023
DIG In: Evaluating Disparities in Image Generations with Indicators for
  Geographic Diversity
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity
Melissa Hall
Candace Ross
Adina Williams
Nicolas Carion
M. Drozdzal
Adriana Romero Soriano
EGVM
362
9
0
11 Aug 2023
The Five-Dollar Model: Generating Game Maps and Sprites from Sentence
  Embeddings
The Five-Dollar Model: Generating Game Maps and Sprites from Sentence EmbeddingsArtificial Intelligence and Interactive Digital Entertainment Conference (AIIDE), 2023
Timothy Merino
Roman Negri
Dipika Rajesh
M. Charity
Julian Togelius
DiffMVLM
165
19
0
08 Aug 2023
Learning Concise and Descriptive Attributes for Visual Recognition
Learning Concise and Descriptive Attributes for Visual RecognitionIEEE International Conference on Computer Vision (ICCV), 2023
Andy Yan
Yu Wang
Yiwu Zhong
Chengyu Dong
Zexue He
Yujie Lu
William Wang
Jingbo Shang
Julian McAuley
VLM
297
85
0
07 Aug 2023
Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating
  Vision-Language Models
Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language ModelsACM Multimedia (ACM MM), 2023
Zheng Ma
Mianzhi Pan
Wenhan Wu
Ka Leong Cheng
Jianbing Zhang
Shujian Huang
Jiajun Chen
VLMCoGe
237
8
0
06 Aug 2023
Multimodal Neurons in Pretrained Text-Only Transformers
Multimodal Neurons in Pretrained Text-Only Transformers
Sarah Schwettmann
Neil Chowdhury
Samuel J. Klein
David Bau
Antonio Torralba
MILM
275
43
0
03 Aug 2023
Reverse Stable Diffusion: What prompt was used to generate this image?
Reverse Stable Diffusion: What prompt was used to generate this image?Computer Vision and Image Understanding (CVIU), 2023
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
M. Shah
VLMDiffM
276
11
0
02 Aug 2023
Guiding Image Captioning Models Toward More Specific Captions
Guiding Image Captioning Models Toward More Specific CaptionsIEEE International Conference on Computer Vision (ICCV), 2023
Simon Kornblith
Lala Li
Zirui Wang
Thao Nguyen
320
21
0
31 Jul 2023
Visual Captioning at Will: Describing Images and Videos Guided by a Few
  Stylized Sentences
Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized SentencesACM Multimedia (ACM MM), 2023
Di Yang
Hongyu Chen
Xinglin Hou
Bo Xiao
Yuning Jiang
Qin Jin
241
8
0
31 Jul 2023
UniBriVL: Robust Universal Representation and Generation of Audio Driven
  Diffusion Models
UniBriVL: Robust Universal Representation and Generation of Audio Driven Diffusion Models
Sen Fang
Bowen Gao
Yangjian Wu
T. Teoh
DiffM
227
1
0
29 Jul 2023
Exploring Annotation-free Image Captioning with Retrieval-augmented
  Pseudo Sentence Generation
Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence GenerationACM Multimedia Asia (MA), 2023
Zhiyuan Li
Dongnan Liu
Heng Wang
Chaoyi Zhang
Weidong (Tom) Cai
RALM
199
2
0
27 Jul 2023
Improving Multimodal Datasets with Image Captioning
Improving Multimodal Datasets with Image CaptioningNeural Information Processing Systems (NeurIPS), 2023
Thao Nguyen
S. Gadre
Gabriel Ilharco
Sewoong Oh
Ludwig Schmidt
VLM
263
125
0
19 Jul 2023
Text2Layer: Layered Image Generation using Latent Diffusion Model
Text2Layer: Layered Image Generation using Latent Diffusion Model
Xinyang Zhang
Wentian Zhao
Xin Lu
J. Chien
DiffM
196
27
0
19 Jul 2023
Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation
  Evaluation
Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation EvaluationACM Multimedia (ACM MM), 2023
Federico Betti
Jacopo Staiano
Lorenzo Baraldi
Lorenzo Baraldi
Rita Cucchiara
Andrii Zadaianchuk
EGVM
154
12
0
18 Jul 2023
Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion
  Models
Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models
Sanghyun Kim
Seohyeong Jung
Balhae Kim
Moonseok Choi
Jinwoo Shin
Juho Lee
DiffM
140
37
0
12 Jul 2023
Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image
  Alignment with Iterative VQA Feedback
Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA FeedbackNeural Information Processing Systems (NeurIPS), 2023
Jaskirat Singh
Liang Zheng
306
39
0
10 Jul 2023
Linear Alignment of Vision-language Models for Image Captioning
Linear Alignment of Vision-language Models for Image Captioning
Fabian Paischer
M. Hofmarcher
Sepp Hochreiter
Thomas Adler
CLIPVLM
486
2
0
10 Jul 2023
CLIPAG: Towards Generator-Free Text-to-Image Generation
CLIPAG: Towards Generator-Free Text-to-Image GenerationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Roy Ganz
Michael Elad
VLM
227
15
0
29 Jun 2023
Self-Supervised Image Captioning with CLIP
Self-Supervised Image Captioning with CLIP
Chuanyang Jin
VLMSSL
210
3
0
26 Jun 2023
Restart Sampling for Improving Generative Processes
Restart Sampling for Improving Generative ProcessesNeural Information Processing Systems (NeurIPS), 2023
Yilun Xu
Mingyang Deng
Xiang Cheng
Yonglong Tian
Ziming Liu
Tommi Jaakkola
DiffMVLM
315
78
0
26 Jun 2023
Learning Descriptive Image Captioning via Semipermeable Maximum
  Likelihood Estimation
Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood EstimationNeural Information Processing Systems (NeurIPS), 2023
Zihao Yue
Anwen Hu
Liang Zhang
Qin Jin
350
7
0
23 Jun 2023
Listener Model for the PhotoBook Referential Game with CLIPScores as
  Implicit Reference Chain
Listener Model for the PhotoBook Referential Game with CLIPScores as Implicit Reference ChainAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Shih-Lun Wu
Yi-Hui Chou
Liang Li
152
0
0
16 Jun 2023
Previous
123...252627282930
Next
Page 26 of 30
Pageof 30