ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.08718
  4. Cited By
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
v1v2v3 (latest)

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
18 April 2021
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
    CLIP
ArXiv (abs)PDFHTML

Papers citing "CLIPScore: A Reference-free Evaluation Metric for Image Captioning"

50 / 1,489 papers shown
Evaluating the Robustness of Text-to-image Diffusion Models against
  Real-world Attacks
Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks
Hongcheng Gao
Hao Zhang
Yinpeng Dong
Zhijie Deng
AAML
305
25
0
16 Jun 2023
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to
  Enhance Visio-Linguistic Compositional Understanding
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional UnderstandingComputer Vision and Pattern Recognition (CVPR), 2023
Le Zhang
Rabiul Awal
Aishwarya Agrawal
CoGeVLM
418
27
0
15 Jun 2023
Pragmatic Inference with a CLIP Listener for Contrastive Captioning
Pragmatic Inference with a CLIP Listener for Contrastive CaptioningAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Jiefu Ou
Benno Krojer
Daniel Fried
272
6
0
15 Jun 2023
Extending CLIP's Image-Text Alignment to Referring Image Segmentation
Extending CLIP's Image-Text Alignment to Referring Image SegmentationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Seoyeon Kim
Minguk Kang
Dongwon Kim
Jaesik Park
Suha Kwak
VLM
309
21
0
14 Jun 2023
Scalable 3D Captioning with Pretrained Models
Scalable 3D Captioning with Pretrained ModelsNeural Information Processing Systems (NeurIPS), 2023
Tiange Luo
C. Rockwell
Honglak Lee
Justin Johnson
311
213
0
12 Jun 2023
Boosting GUI Prototyping with Diffusion Models
Boosting GUI Prototyping with Diffusion ModelsIEEE International Requirements Engineering Conference (RE), 2023
Jialiang Wei
A. Courbis
Thomas Lambolais
Binbin Xu
P. Bernard
Gérard Dray
DiffM
170
25
0
09 Jun 2023
Grounded Text-to-Image Synthesis with Attention Refocusing
Grounded Text-to-Image Synthesis with Attention RefocusingComputer Vision and Pattern Recognition (CVPR), 2023
Quynh Phung
Songwei Ge
Jia-Bin Huang
DiffM
414
157
0
08 Jun 2023
SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions
SyncDiffusion: Coherent Montage via Synchronized Joint DiffusionsNeural Information Processing Systems (NeurIPS), 2023
Yuseung Lee
Kunho Kim
Hyunjin Kim
Minhyuk Sung
DiffM
389
91
0
08 Jun 2023
WOUAF: Weight Modulation for User Attribution and Fingerprinting in
  Text-to-Image Diffusion Models
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2023
Changhoon Kim
Kyle Min
Maitreya Patel
Sheng Cheng
Yezhou Yang
WIGM
338
53
0
07 Jun 2023
AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment
AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment
Chunyi Li
Zicheng Zhang
Haoning Wu
Wei Sun
Xiongkuo Min
Xiaohong Liu
Guangtao Zhai
Weisi Lin
EGVM
259
195
0
07 Jun 2023
ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image
  Diffusion Models
ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion ModelsAAAI Conference on Artificial Intelligence (AAAI), 2023
Maitreya Patel
Tejas Gokhale
Chitta Baral
Yezhou Yang
CoGe
269
16
0
07 Jun 2023
Rewarded soups: towards Pareto-optimal alignment by interpolating
  weights fine-tuned on diverse rewards
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewardsNeural Information Processing Systems (NeurIPS), 2023
Alexandre Ramé
Guillaume Couairon
Mustafa Shukor
Corentin Dancette
Jean-Baptiste Gaya
Laure Soulier
Matthieu Cord
MoMe
367
203
0
07 Jun 2023
Multi-modal Latent Diffusion
Multi-modal Latent DiffusionEntropy (Entropy), 2023
Mustapha Bounoua
Giulio Franzese
Pietro Michiardi
DiffM
264
16
0
07 Jun 2023
HeadSculpt: Crafting 3D Head Avatars with Text
HeadSculpt: Crafting 3D Head Avatars with TextNeural Information Processing Systems (NeurIPS), 2023
Xiaoping Han
Yukang Cao
Kai Han
Xiatian Zhu
Jiankang Deng
Yi-Zhe Song
Tao Xiang
Kwan-Yee K. Wong
DiffM
203
61
0
05 Jun 2023
Revisiting the Role of Language Priors in Vision-Language Models
Revisiting the Role of Language Priors in Vision-Language ModelsInternational Conference on Machine Learning (ICML), 2023
Zhiqiu Lin
Xinyue Chen
Deepak Pathak
Pengchuan Zhang
Deva Ramanan
VLM
470
38
0
02 Jun 2023
ReFACT: Updating Text-to-Image Models by Editing the Text Encoder
ReFACT: Updating Text-to-Image Models by Editing the Text EncoderNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Dana Arad
Hadas Orgad
Yonatan Belinkov
KELM
348
29
0
01 Jun 2023
Diffusion Brush: A Latent Diffusion Model-based Editing Tool for
  AI-generated Images
Diffusion Brush: A Latent Diffusion Model-based Editing Tool for AI-generated Images
Peyman Gholami
R. Xiao
DiffM
257
3
0
31 May 2023
Understanding and Mitigating Copying in Diffusion Models
Understanding and Mitigating Copying in Diffusion ModelsNeural Information Processing Systems (NeurIPS), 2023
Gowthami Somepalli
Vasu Singla
Micah Goldblum
Jonas Geiping
Tom Goldstein
DiffM
273
201
0
31 May 2023
RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine
  Semantic Re-alignment
RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignmentIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023
Guian Fang
Zutao Jiang
Jianhua Han
Guangsong Lu
Hang Xu
Shengcai Liao
Xiaodan Liang
EGVM
171
2
0
31 May 2023
DisCLIP: Open-Vocabulary Referring Expression Generation
DisCLIP: Open-Vocabulary Referring Expression GenerationBritish Machine Vision Conference (BMVC), 2023
Lior Bracha
E. Shaar
Aviv Shamsian
Ethan Fetaya
Gal Chechik
ObjD
261
9
0
30 May 2023
Nested Diffusion Processes for Anytime Image Generation
Nested Diffusion Processes for Anytime Image GenerationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Noam Elata
Bahjat Kawar
T. Michaeli
Michael Elad
DiffM
262
7
0
30 May 2023
LayerDiffusion: Layered Controlled Image Editing with Diffusion Models
LayerDiffusion: Layered Controlled Image Editing with Diffusion Models
Pengzhi Li
Qinxuan Huang
Yikang Ding
Zhiheng Li
DiffM
214
43
0
30 May 2023
TaleCrafter: Interactive Story Visualization with Multiple Characters
TaleCrafter: Interactive Story Visualization with Multiple CharactersACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia), 2023
Yuan Gong
Youxin Pang
Xiaodong Cun
Menghan Xia
Yingqing He
...
Longyue Wang
Yong Zhang
Xintao Wang
Ying Shan
Yujiu Yang
DiffM
351
65
0
29 May 2023
InstructEdit: Improving Automatic Masks for Diffusion-based Image
  Editing With User Instructions
InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions
Qian Wang
Biao Zhang
Michael Birsak
Peter Wonka
DiffM
249
56
0
29 May 2023
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in
  Vision-Language Models
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Shuai Zhao
Xiaohan Wang
Linchao Zhu
Yezhou Yang
VLM
322
39
0
29 May 2023
Conditional Score Guidance for Text-Driven Image-to-Image Translation
Conditional Score Guidance for Text-Driven Image-to-Image TranslationNeural Information Processing Systems (NeurIPS), 2023
Hyunsoo Lee
Minsoo Kang
Bohyung Han
DiffM
188
19
0
29 May 2023
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
Jia-Bin Huang
Yi Ren
Rongjie Huang
Dongchao Yang
Zhenhui Ye
Chen Zhang
Jinglin Liu
Xiang Yin
Zejun Ma
Zhou Zhao
DiffM
221
98
0
29 May 2023
FuseCap: Leveraging Large Language Models for Enriched Fused Image
  Captions
FuseCap: Leveraging Large Language Models for Enriched Fused Image CaptionsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Noam Rotstein
David Bensaid
Shaked Brody
Roy Ganz
Ron Kimmel
VLM
393
53
0
28 May 2023
FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph
  Parsing
FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph ParsingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhuang Li
Yuyang Chai
Terry Yue Zhuo
Zhuang Li
Gholamreza Haffari
Fei Li
Donghong Ji
Quan Hung Tran
327
51
0
27 May 2023
Towards Consistent Video Editing with Text-to-Image Diffusion Models
Towards Consistent Video Editing with Text-to-Image Diffusion ModelsNeural Information Processing Systems (NeurIPS), 2023
Zicheng Zhang
Bonan li
Xuecheng Nie
Congying Han
Tiande Guo
Luoqi Liu
DiffM
136
43
0
27 May 2023
Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion
  Inference
Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion InferenceAAAI Conference on Artificial Intelligence (AAAI), 2023
Zihao Yu
Haoyang Li
Fangcheng Fu
Xupeng Miao
Tengjiao Wang
DiffM
312
16
0
27 May 2023
MPCHAT: Towards Multimodal Persona-Grounded Conversation
MPCHAT: Towards Multimodal Persona-Grounded ConversationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Jaewoo Ahn
Yeda Song
Sangdoo Yun
Gunhee Kim
180
26
0
27 May 2023
S4M: Generating Radiology Reports by A Single Model for Multiple Body
  Parts
S4M: Generating Radiology Reports by A Single Model for Multiple Body PartsAsian Conference on Computer Vision (ACCV), 2023
Qi Chen
Yutong Xie
Biao Wu
Minh-Son To
James Ang
Qi Wu
155
2
0
26 May 2023
Are Diffusion Models Vision-And-Language Reasoners?
Are Diffusion Models Vision-And-Language Reasoners?Neural Information Processing Systems (NeurIPS), 2023
Benno Krojer
Elinor Poole-Dayan
Vikram S. Voleti
Christopher Pal
Siva Reddy
500
17
0
25 May 2023
Parallel Sampling of Diffusion Models
Parallel Sampling of Diffusion ModelsNeural Information Processing Systems (NeurIPS), 2023
Andy Shih
Suneel Belkhale
Stefano Ermon
Dorsa Sadigh
Nima Anari
DiffM
438
100
0
25 May 2023
GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
GenerateCT: Text-Conditional Generation of 3D Chest CT VolumesEuropean Conference on Computer Vision (ECCV), 2023
Ibrahim Ethem Hamamci
Sezgin Er
Anjany Sekuboyina
Enis Simsar
A. Tezcan
...
Hadrien Reynaud
Sarthak Pati
Christian Bluethgen
M. K. Özdemir
Bjoern Menze
DiffMMedIm
392
52
0
25 May 2023
Weakly Supervised Vision-and-Language Pre-training with Relative
  Representations
Weakly Supervised Vision-and-Language Pre-training with Relative RepresentationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Chi Chen
Peng Li
Maosong Sun
Yang Liu
152
2
0
24 May 2023
MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal
  Image Generation
MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image GenerationNeural Information Processing Systems (NeurIPS), 2023
Marco Bellagente
Manuel Brack
H. Teufel
Felix Friedrich
Bjorn Deiseroth
...
Koen Oostermeijer
Andres Felipe Cruz Salinas
P. Schramowski
Kristian Kersting
Samuel Weinbach
376
28
0
24 May 2023
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying
  References
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying ReferencesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Tianyi Tang
Hongyuan Lu
Yuchen Eleanor Jiang
Haoyang Huang
Dongdong Zhang
Wayne Xin Zhao
Tom Kocmi
Furu Wei
162
7
0
24 May 2023
Transferring Visual Attributes from Natural Language to Verified Image
  Generation
Transferring Visual Attributes from Natural Language to Verified Image Generation
Rodrigo Valerio
João Bordalo
Michal Yarom
Yonattan Bitton
Idan Szpektor
João Magalhães
194
5
0
24 May 2023
An Examination of the Robustness of Reference-Free Image Captioning
  Evaluation Metrics
An Examination of the Robustness of Reference-Free Image Captioning Evaluation MetricsFindings (Findings), 2023
Saba Ahmadi
Aishwarya Agrawal
232
14
0
24 May 2023
Gender Biases in Automatic Evaluation Metrics for Image Captioning
Gender Biases in Automatic Evaluation Metrics for Image CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Haoyi Qiu
Zi-Yi Dou
Tianlu Wang
Asli Celikyilmaz
Nanyun Peng
EGVM
413
22
0
24 May 2023
Text-guided 3D Human Generation from 2D Collections
Text-guided 3D Human Generation from 2D CollectionsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Tsu-Jui Fu
Wenhan Xiong
Yixin Nie
Jingyu Liu
Barlas Ouguz
William Yang Wang
175
3
0
23 May 2023
CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained
  Vision-Language Model
CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language ModelIEEE Transactions on Image Processing (IEEE TIP), 2023
Shuai Zhao
Xiaohan Wang
Linchao Zhu
Yezhou Yang
CLIPVLM
381
45
0
23 May 2023
If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based
  Text-to-Image Generation by Selection
If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection
Shyamgopal Karthik
Karsten Roth
Goran Frehse
Zeynep Akata
236
33
0
22 May 2023
The CLIP Model is Secretly an Image-to-Prompt Converter
The CLIP Model is Secretly an Image-to-Prompt ConverterNeural Information Processing Systems (NeurIPS), 2023
Yuxuan Ding
Chunna Tian
Haoxuan Ding
Lingqiao Liu
DiffM
150
17
0
22 May 2023
Boosting Human-Object Interaction Detection with Text-to-Image Diffusion
  Model
Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model
Jie Yang
Bing Li
Fengyu Yang
Ailing Zeng
Lei Zhang
Ruimao Zhang
VLMDiffM
275
31
0
20 May 2023
Movie101: A New Movie Understanding Benchmark
Movie101: A New Movie Understanding BenchmarkAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Zihao Yue
Tao Gui
Anwen Hu
Liang Zhang
Ziheng Wang
Qin Jin
VGen
278
25
0
20 May 2023
LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis
LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis
Yu Xie
Rui Li
Kaidong Zhang
Xin Luo
Dong Liu
DiffM
479
5
0
19 May 2023
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image
  Synthesis Evaluation
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis EvaluationNeural Information Processing Systems (NeurIPS), 2023
Yujie Lu
Xianjun Yang
Xiujun Li
Xinze Wang
William Yang Wang
EGVM
431
99
0
18 May 2023
Previous
123...2627282930
Next
Page 27 of 30
Pageof 30