ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.08718
  4. Cited By
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
v1v2v3 (latest)

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
18 April 2021
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
    CLIP
ArXiv (abs)PDFHTML

Papers citing "CLIPScore: A Reference-free Evaluation Metric for Image Captioning"

50 / 1,488 papers shown
X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation
  with Visual Large Language Models
X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models
Yixiong Chen
Li Liu
C. Ding
174
29
0
18 May 2023
InfoMetIC: An Informative Metric for Reference-free Image Caption
  Evaluation
InfoMetIC: An Informative Metric for Reference-free Image Caption EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Anwen Hu
Shizhe Chen
Liang Zhang
Qin Jin
227
28
0
10 May 2023
iEdit: Localised Text-guided Image Editing with Weak Supervision
iEdit: Localised Text-guided Image Editing with Weak Supervision
Rumeysa Bodur
Erhan Gundogdu
Binod Bhattarai
Tae-Kyun Kim
M. Donoser
Loris Bazzani
DiffM
196
20
0
10 May 2023
ReGeneration Learning of Diffusion Models with Rich Prompts for
  Zero-Shot Image Translation
ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation
Yupei Lin
Senyang Zhang
Xiaojun Yang
Tianlin Li
Yukai Shi
DiffM
139
7
0
08 May 2023
Locally Attentional SDF Diffusion for Controllable 3D Shape Generation
Locally Attentional SDF Diffusion for Controllable 3D Shape GenerationACM Transactions on Graphics (TOG), 2023
Xin-Yang Zheng
Hao Pan
Peng-Shuai Wang
Xin Tong
Yang Liu
H. Shum
302
164
0
08 May 2023
Text-to-Image Diffusion Models can be Easily Backdoored through
  Multimodal Data Poisoning
Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data PoisoningACM Multimedia (ACM MM), 2023
Shengfang Zhai
Yinpeng Dong
Qingni Shen
Shih-Chieh Pu
Yuejian Fang
Hang Su
229
97
0
07 May 2023
A Suite of Generative Tasks for Multi-Level Multimodal Webpage
  Understanding
A Suite of Generative Tasks for Multi-Level Multimodal Webpage UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Andrea Burns
Krishna Srinivasan
Joshua Ainslie
Geoff Brown
Bryan A. Plummer
Kate Saenko
Jianmo Ni
Mandy Guo
3DV
218
16
0
05 May 2023
The Role of Data Curation in Image Captioning
The Role of Data Curation in Image CaptioningConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Wenyan Li
Jonas F. Lotz
Chen Qiu
Desmond Elliott
DiffM
258
8
0
05 May 2023
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Changrong Xiao
S. Xu
Kunpeng Zhang
DiffM
207
16
0
03 May 2023
Multimodal Procedural Planning via Dual Text-Image Prompting
Multimodal Procedural Planning via Dual Text-Image PromptingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yujie Lu
Pan Lu
Zhiyu Zoey Chen
Wanrong Zhu
Xinze Wang
William Yang Wang
LM&Ro
234
53
0
02 May 2023
SceneGenie: Scene Graph Guided Diffusion Models for Image Synthesis
SceneGenie: Scene Graph Guided Diffusion Models for Image Synthesis
Azade Farshad
Yousef Yeganeh
Yucong Chi
Cheng-nan Shen
Bjorn Ommer
Nassir Navab
DiffM
226
40
0
28 Apr 2023
Rethinking Benchmarks for Cross-modal Image-text Retrieval
Rethinking Benchmarks for Cross-modal Image-text RetrievalAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023
Wei Chen
Linli Yao
Qin Jin
VLM
288
23
0
21 Apr 2023
Soundini: Sound-Guided Diffusion for Natural Video Editing
Soundini: Sound-Guided Diffusion for Natural Video Editing
Seung Hyun Lee
Si-Yeol Kim
Innfarn Yoo
Feng Yang
Donghyeon Cho
Youngseo Kim
Huiwen Chang
Jinkyu Kim
Sangpil Kim
VGenDiffM
188
20
0
13 Apr 2023
A-CAP: Anticipation Captioning with Commonsense Knowledge
A-CAP: Anticipation Captioning with Commonsense KnowledgeComputer Vision and Pattern Recognition (CVPR), 2023
D. Vo
Quoc-An Luong
Akihiro Sugimoto
Hideki Nakayama
149
3
0
13 Apr 2023
Continual Diffusion: Continual Customization of Text-to-Image Diffusion
  with C-LoRA
Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA
James Smith
Yen-Chang Hsu
Lingyu Zhang
Ting Hua
Z. Kira
Yilin Shen
Hongxia Jin
DiffM
447
143
0
12 Apr 2023
HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image
  Models
HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Eslam Mohamed Bakr
Pengzhan Sun
Xiaoqian Shen
Faizan Farooq Khan
Li Erran Li
Mohamed Elhoseiny
VLM
307
104
0
11 Apr 2023
OpenAGI: When LLM Meets Domain Experts
OpenAGI: When LLM Meets Domain ExpertsNeural Information Processing Systems (NeurIPS), 2023
Yingqiang Ge
Qingfeng Lan
Kai Mei
Jianchao Ji
Juntao Tan
Shuyuan Xu
Zelong Li
Zelong Li
VLMLRM
317
308
0
10 Apr 2023
Model-Agnostic Gender Debiased Image Captioning
Model-Agnostic Gender Debiased Image CaptioningComputer Vision and Pattern Recognition (CVPR), 2023
Yusuke Hirota
Yuta Nakashima
Noa Garcia
FaML
339
23
0
07 Apr 2023
Multimodal Garment Designer: Human-Centric Latent Diffusion Models for
  Fashion Image Editing
Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image EditingIEEE International Conference on Computer Vision (ICCV), 2023
Alberto Baldrati
Davide Morelli
Giuseppe Cartella
Marcella Cornia
Marco Bertini
Rita Cucchiara
DiffM
212
89
0
04 Apr 2023
Toward Verifiable and Reproducible Human Evaluation for Text-to-Image
  Generation
Toward Verifiable and Reproducible Human Evaluation for Text-to-Image GenerationComputer Vision and Pattern Recognition (CVPR), 2023
Mayu Otani
Riku Togashi
Yu Sawai
Ryosuke Ishigami
Yuta Nakashima
Esa Rahtu
J. Heikkilä
Shiníchi Satoh
224
78
0
04 Apr 2023
Cross-Domain Image Captioning with Discriminative Finetuning
Cross-Domain Image Captioning with Discriminative FinetuningComputer Vision and Pattern Recognition (CVPR), 2023
Roberto Dessì
Michele Bevilacqua
Eleonora Gualdoni
Nathanaël Carraz Rakotonirina
Francesca Franzon
Marco Baroni
CLIP
248
25
0
04 Apr 2023
Text-Conditioned Sampling Framework for Text-to-Image Generation with
  Masked Generative Models
Text-Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Jaewoong Lee
Sang-Sub Jang
Jaehyeong Jo
Jaehong Yoon
Yunji Kim
Jin-Hwa Kim
Jung-Woo Ha
Sung Ju Hwang
DiffM
239
7
0
04 Apr 2023
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free
  Videos
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free VideosAAAI Conference on Artificial Intelligence (AAAI), 2023
Yue Ma
Yin-Yin He
Xiaodong Cun
Xintao Wang
Siran Chen
Ying Shan
Xiu Li
Qifeng Chen
DiffMVGen
275
274
0
03 Apr 2023
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
Wen Wang
Yan Jiang
K. Xie
Zide Liu
Hao Chen
Yue Cao
Xinlong Wang
Chunhua Shen
DiffMVGen
268
135
0
30 Mar 2023
MDP: A Generalized Framework for Text-Guided Image Editing by
  Manipulating the Diffusion Path
MDP: A Generalized Framework for Text-Guided Image Editing by Manipulating the Diffusion Path
Qian Wang
Biao Zhang
Michael Birsak
Peter Wonka
DiffM
251
23
0
29 Mar 2023
Hierarchical Video-Moment Retrieval and Step-Captioning
Hierarchical Video-Moment Retrieval and Step-CaptioningComputer Vision and Pattern Recognition (CVPR), 2023
Abhaysinh Zala
Jaemin Cho
Satwik Kottur
Xilun Chen
Barlas Ouguz
Yasher Mehdad
Joey Tianyi Zhou
3DV
276
85
0
29 Mar 2023
Exposing and Addressing Cross-Task Inconsistency in Unified
  Vision-Language Models
Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models
A. Maharana
Amita Kamath
Christopher Clark
Joey Tianyi Zhou
Aniruddha Kembhavi
248
4
0
28 Mar 2023
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing
Senmao Li
Joost van de Weijer
Taihang Hu
Fahad Shahbaz Khan
Qibin Hou
Yaxing Wang
Jian Yang
DiffM
398
75
0
28 Mar 2023
Fine-grained Audible Video Description
Fine-grained Audible Video DescriptionComputer Vision and Pattern Recognition (CVPR), 2023
Xuyang Shen
Dong Li
Jinxing Zhou
Zhen Qin
Bowen He
...
Yuchao Dai
Lingpeng Kong
Meng Wang
Yu Qiao
Yiran Zhong
VGen
175
18
0
27 Mar 2023
Ablating Concepts in Text-to-Image Diffusion Models
Ablating Concepts in Text-to-Image Diffusion ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Nupur Kumari
Bin Zhang
Sheng-Yu Wang
Eli Shechtman
Richard Y. Zhang
Jun-Yan Zhu
VLM
482
283
0
23 Mar 2023
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video
  Generators
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video GeneratorsIEEE International Conference on Computer Vision (ICCV), 2023
Levon Khachatryan
A. Movsisyan
Vahram Tadevosyan
Roberto Henschel
Zinan Lin
Shant Navasardyan
Humphrey Shi
VGen
308
733
0
23 Mar 2023
Zero-guidance Segmentation Using Zero Segment Labels
Zero-guidance Segmentation Using Zero Segment LabelsIEEE International Conference on Computer Vision (ICCV), 2023
Pitchaporn Rewatbowornwong
Nattanat Chatthee
Ekapol Chuangsuwanich
Supasorn Suwajanakorn
VLM
173
18
0
23 Mar 2023
Pix2Video: Video Editing using Image Diffusion
Pix2Video: Video Editing using Image DiffusionIEEE International Conference on Computer Vision (ICCV), 2023
Duygu Ceylan
C. Huang
Niloy J. Mitra
DiffMVGen
412
339
0
22 Mar 2023
Positive-Augmented Contrastive Learning for Image and Video Captioning
  Evaluation
Positive-Augmented Contrastive Learning for Image and Video Captioning EvaluationComputer Vision and Pattern Recognition (CVPR), 2023
Sara Sarto
Manuele Barraco
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
335
86
0
21 Mar 2023
VideoXum: Cross-modal Visual and Textural Summarization of Videos
VideoXum: Cross-modal Visual and Textural Summarization of VideosIEEE transactions on multimedia (IEEE TMM), 2023
Jingyang Lin
Hang Hua
Ming Chen
Yikang Li
Jenhao Hsiao
C. Ho
Jiebo Luo
381
50
0
21 Mar 2023
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation
  with Question Answering
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question AnsweringIEEE International Conference on Computer Vision (ICCV), 2023
Yushi Hu
Benlin Liu
Jungo Kasai
Yizhong Wang
Mari Ostendorf
Ranjay Krishna
Noah A. Smith
EGVM
337
344
0
21 Mar 2023
VEIL: Vetting Extracted Image Labels from In-the-Wild Captions for
  Weakly-Supervised Object Detection
VEIL: Vetting Extracted Image Labels from In-the-Wild Captions for Weakly-Supervised Object DetectionConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Arushi Rai
Adriana Kovashka
290
0
0
16 Mar 2023
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
FateZero: Fusing Attentions for Zero-shot Text-based Video EditingIEEE International Conference on Computer Vision (ICCV), 2023
Zhiheng Liu
Xiaodong Cun
Yong Zhang
Chenyang Lei
Xintao Wang
Ying Shan
Qifeng Chen
VGen
413
466
0
16 Mar 2023
PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning
PR-MCS: Perturbation Robust Metric for MultiLingual Image CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yongil Kim
Yerin Hwang
Hyeongu Yun
Seunghyun Yoon
Trung Bui
Kyomin Jung
270
7
0
15 Mar 2023
Editing Implicit Assumptions in Text-to-Image Diffusion Models
Editing Implicit Assumptions in Text-to-Image Diffusion ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Hadas Orgad
Bahjat Kawar
Yonatan Belinkov
DiffM
368
116
0
14 Mar 2023
Text-to-image Diffusion Models in Generative AI: A Survey
Text-to-image Diffusion Models in Generative AI: A Survey
Chenshuang Zhang
Chaoning Zhang
Mengchun Zhang
In So Kweon
VLM
315
380
0
14 Mar 2023
Scaling up GANs for Text-to-Image Synthesis
Scaling up GANs for Text-to-Image SynthesisComputer Vision and Pattern Recognition (CVPR), 2023
Minguk Kang
Jun-Yan Zhu
Richard Y. Zhang
Jaesik Park
Eli Shechtman
Sylvain Paris
Taesung Park
328
601
0
09 Mar 2023
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive
  Learning
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive LearningIEEE International Conference on Computer Vision (ICCV), 2023
Hritik Bansal
Nishad Singhi
Yu Yang
Fan Yin
Aditya Grover
Kai-Wei Chang
AAML
373
66
0
06 Mar 2023
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only
  Training
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only TrainingInternational Conference on Learning Representations (ICLR), 2023
Wei Li
Linchao Zhu
Longyin Wen
Yi Yang
VLM
229
119
0
06 Mar 2023
Models See Hallucinations: Evaluating the Factuality in Video Captioning
Models See Hallucinations: Evaluating the Factuality in Video CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Hui Liu
Xiaojun Wan
HILM
183
18
0
06 Mar 2023
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based
  Polishing
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based PolishingComputer Vision and Pattern Recognition (CVPR), 2023
Zequn Zeng
Hao Zhang
Zhengjue Wang
Ruiying Lu
Dongsheng Wang
Bo Chen
BDLDiffM
225
58
0
04 Mar 2023
X&Fuse: Fusing Visual Information in Text-to-Image Generation
X&Fuse: Fusing Visual Information in Text-to-Image Generation
Yuval Kirstain
Omer Levy
Adam Polyak
DiffM
99
6
0
02 Mar 2023
Meta Learning to Bridge Vision and Language Models for Multimodal
  Few-Shot Learning
Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot LearningInternational Conference on Learning Representations (ICLR), 2023
Ivona Najdenkoska
Xiantong Zhen
Marcel Worring
VLM
195
31
0
28 Feb 2023
Directed Diffusion: Direct Control of Object Placement through Attention
  Guidance
Directed Diffusion: Direct Control of Object Placement through Attention GuidanceAAAI Conference on Artificial Intelligence (AAAI), 2023
W. Ma
J. P. Lewis
Avisek Lahiri
Thomas Leung
W. Kleijn
DiffM
363
82
0
25 Feb 2023
Learning Visual Representations via Language-Guided Sampling
Learning Visual Representations via Language-Guided SamplingComputer Vision and Pattern Recognition (CVPR), 2023
Mohamed El Banani
Karan Desai
Justin Johnson
SSLVLM
398
36
0
23 Feb 2023
Previous
123...27282930
Next