ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.08718
  4. Cited By
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
v1v2v3 (latest)

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
18 April 2021
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
    CLIP
ArXiv (abs)PDFHTML

Papers citing "CLIPScore: A Reference-free Evaluation Metric for Image Captioning"

50 / 1,474 papers shown
Title
Maestro: Self-Improving Text-to-Image Generation via Agent Orchestration
Maestro: Self-Improving Text-to-Image Generation via Agent Orchestration
Xingchen Wan
Han Zhou
Ruoxi Sun
Hootan Nakhost
Ke Jiang
Rajarishi Sinha
Sercan Ö. Arık
224
4
0
12 Sep 2025
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
Rongyao Fang
Aldrich Yu
Chengqi Duan
Linjiang Huang
S. Bai
Yuxuan Cai
Kun Wang
Si Liu
Xihui Liu
Xue Yang
EGVMVGenReLMLRM
206
7
0
11 Sep 2025
COCO-Urdu: A Large-Scale Urdu Image-Caption Dataset with Multimodal Quality Estimation
COCO-Urdu: A Large-Scale Urdu Image-Caption Dataset with Multimodal Quality Estimation
Umair Hassan
68
0
0
10 Sep 2025
Discovering Divergent Representations between Text-to-Image Models
Discovering Divergent Representations between Text-to-Image Models
Lisa Dunlap
Joseph E. Gonzalez
Trevor Darrell
Fabian Caba Heilbron
Josef Sivic
Bryan C. Russell
EGVM
120
0
0
10 Sep 2025
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Jeffrey Amico
Gabriel Passamani Andrade
John Donaghy
Ben Fielding
Tristin Forbus
...
Edward Phillip Flores Nuño
Diogo Ortega
Shikhar Rastogi
Austin Virts
Matthew J. Wright
OffRLLRM
106
0
0
10 Sep 2025
Prompt-Driven Image Analysis with Multimodal Generative AI: Detection, Segmentation, Inpainting, and Interpretation
Prompt-Driven Image Analysis with Multimodal Generative AI: Detection, Segmentation, Inpainting, and Interpretation
Kaleem Ahmad
MLLM
74
0
0
10 Sep 2025
SVGauge: Towards Human-Aligned Evaluation for SVG Generation
SVGauge: Towards Human-Aligned Evaluation for SVG Generation
Leonardo Zini
Elia Frigieri
Sebastiano Aloscari
Marcello Generali
Lorenzo Dodi
Robert Dosen
Lorenzo Baraldi
100
0
0
08 Sep 2025
Coefficients-Preserving Sampling for Reinforcement Learning with Flow Matching
Coefficients-Preserving Sampling for Reinforcement Learning with Flow Matching
Feng Wang
Zihao Yu
DiffM
195
11
0
07 Sep 2025
LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding
LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding
Yuxuan Hu
Jihao Liu
Ke Wang
Jinliang Zhen
Weikang Shi
Manyuan Zhang
Qi Dou
R. Liu
Aojun Zhou
Hongsheng Li
187
1
0
06 Sep 2025
The Telephone Game: Evaluating Semantic Drift in Unified Models
The Telephone Game: Evaluating Semantic Drift in Unified Models
Sabbir Mollah
Rohit Gupta
S. Swetha
Qingyang Liu
Ahnaf Munir
Mubarak Shah
VLM
131
1
0
04 Sep 2025
SiLVERScore: Semantically-Aware Embeddings for Sign Language Generation Evaluation
SiLVERScore: Semantically-Aware Embeddings for Sign Language Generation Evaluation
Saki Imai
Mert Inan
Anthony Sicilia
Malihe Alikhani
SLR
168
1
0
04 Sep 2025
Measuring How (Not Just Whether) VLMs Build Common Ground
Measuring How (Not Just Whether) VLMs Build Common Ground
Saki Imai
Mert Inan
Anthony Sicilia
Malihe Alikhani
56
0
0
04 Sep 2025
SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation
SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation
Xiaofu Chen
Israfel Salazar
Yova Kementchedjhieva
180
1
0
04 Sep 2025
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
Ouxiang Li
Yuan Wang
Xinting Hu
Huijuan Huang
Rui Chen
Jiarong Ou
Xin Tao
Pengfei Wan
Xiaojuan Qi
Fuli Feng
EGVMCoGeLRM
275
6
0
03 Sep 2025
TeRA: Rethinking Text-driven Realistic 3D Avatar Generation
TeRA: Rethinking Text-driven Realistic 3D Avatar Generation
Yanwen Wang
Yiyu Zhuang
Jiawei Zhang
Li Wang
Yifei Zeng
X. Cao
Xinxin Zuo
Hao Zhu
136
1
0
02 Sep 2025
Measuring Image-Relation Alignment: Reference-Free Evaluation of VLMs and Synthetic Pre-training for Open-Vocabulary Scene Graph Generation
Measuring Image-Relation Alignment: Reference-Free Evaluation of VLMs and Synthetic Pre-training for Open-Vocabulary Scene Graph Generation
Maelic Neau
Zoe Falomir
Cédric Buche
Akihiro Sugimoto
80
0
0
01 Sep 2025
Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling
Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling
N. Frumkin
Diana Marculescu
DiffMMQVLM
125
0
0
01 Sep 2025
UI-Bench: A Benchmark for Evaluating Design Capabilities of AI Text-to-App Tools
UI-Bench: A Benchmark for Evaluating Design Capabilities of AI Text-to-App Tools
Sam Jung
Agustin Garcinuno
Spencer Mateega
ELM
208
0
0
28 Aug 2025
Audio-Guided Visual Editing with Complex Multi-Modal Prompts
Audio-Guided Visual Editing with Complex Multi-Modal Prompts
Hyeonyu Kim
Seokhoon Jeong
Seonghee Han
Chanhyuk Choi
Taehwan Kim
DiffM
81
0
0
28 Aug 2025
ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion
ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion
Xurui Peng
Hong Liu
Chenqian Yan
Rui Ma
Fangmin Chen
X. Wang
Zhihua Wu
Songwei Liu
Mingbao Lin
DiffM
177
1
0
27 Aug 2025
Composition and Alignment of Diffusion Models using Constrained Learning
Composition and Alignment of Diffusion Models using Constrained Learning
Shervin Khalafi
Ignacio Hounie
Dongsheng Ding
Alejandro Ribeiro
132
1
0
26 Aug 2025
The Mind's Eye: A Multi-Faceted Reward Framework for Guiding Visual Metaphor Generation
The Mind's Eye: A Multi-Faceted Reward Framework for Guiding Visual Metaphor Generation
Girish A. Koushik
Fatemeh Nazarieh
Katherine Birch
Shenbin Qian
Diptesh Kanojia
EGVM
72
0
0
26 Aug 2025
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Fucai Ke
Joy Hsu
Zhixi Cai
Zixian Ma
Xin Zheng
...
P. D. Haghighi
Gholamreza Haffari
Ranjay Krishna
Jiajun Wu
H. Rezatofighi
ReLMCoGeLRM
316
8
0
24 Aug 2025
MMCIG: Multimodal Cover Image Generation for Text-only Documents and Its Dataset Construction via Pseudo-labeling
MMCIG: Multimodal Cover Image Generation for Text-only Documents and Its Dataset Construction via Pseudo-labeling
Hyeyeon Kim
Sungwoo Han
Jingun Kwon
Hidetaka Kamigaito
Manabu Okumura
76
0
0
24 Aug 2025
Condition Weaving Meets Expert Modulation: Towards Universal and Controllable Image Generation
Condition Weaving Meets Expert Modulation: Towards Universal and Controllable Image Generation
Guoqing Zhang
Xingtong Ge
Lu Shi
Xin Zhang
Muqing Xue
Wanru Xu
Yigang Cen
J. Zhang
DiffM
166
0
0
24 Aug 2025
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation
Kaiyue Sun
Rongyao Fang
Chengqi Duan
Xian Liu
Xihui Liu
126
11
0
24 Aug 2025
Structural Energy-Guided Sampling for View-Consistent Text-to-3D
Structural Energy-Guided Sampling for View-Consistent Text-to-3D
Qing Zhang
Jinguang Tong
Jie Hong
Jing Zhang
Xuesong Li
DiffM3DGS
136
0
0
23 Aug 2025
Toward Socially Aware Vision-Language Models: Evaluating Cultural Competence Through Multimodal Story Generation
Toward Socially Aware Vision-Language Models: Evaluating Cultural Competence Through Multimodal Story Generation
Arka Mukherjee
Shreya Ghosh
VLM
112
1
0
22 Aug 2025
Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers
Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers
Shikang Zheng
Liang Feng
Xinyu Wang
Qinming Zhou
Peiliang Cai
...
Jiacheng Liu
Yuqi Lin
Junjie Chen
Yue Ma
Linfeng Zhang
88
3
0
22 Aug 2025
A Framework for Benchmarking Fairness-Utility Trade-offs in Text-to-Image Models via Pareto Frontiers
A Framework for Benchmarking Fairness-Utility Trade-offs in Text-to-Image Models via Pareto Frontiers
Marco N. Bochernitsan
Rodrigo C. Barros
L. S. Kupssinskü
EGVM
92
0
0
22 Aug 2025
CurveFlow: Curvature-Guided Flow Matching for Image Generation
CurveFlow: Curvature-Guided Flow Matching for Image Generation
Yan Luo
Drake Du
Niraj Pudasaini
Yi Fang
Mengyu Wang
199
3
0
20 Aug 2025
MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion
MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion
Fei Peng
Junqiang Wu
Yan Li
Tingting Gao
Di Zhang
Huiyuan Fu
DiffM
128
2
0
20 Aug 2025
Inference Time Debiasing Concepts in Diffusion Models
Inference Time Debiasing Concepts in Diffusion Models
L. S. Kupssinskü
Marco N. Bochernitsan
Jordan Kopper
Otávio Parraga
Rodrigo C. Barros
DiffM
100
1
0
19 Aug 2025
Single-Reference Text-to-Image Manipulation with Dual Contrastive Denoising Score
Single-Reference Text-to-Image Manipulation with Dual Contrastive Denoising Score
Syed Muhmmad Israr
Feng Zhao
DiffM
124
0
0
18 Aug 2025
Trust Region Constrained Measure Transport in Path Space for Stochastic Optimal Control and Inference
Trust Region Constrained Measure Transport in Path Space for Stochastic Optimal Control and Inference
Denis Blessing
Julius Berner
Lorenz Richter
Carles Domingo-Enrich
Yuanqi Du
Arash Vahdat
Gerhard Neumann
112
5
0
17 Aug 2025
Region-Level Context-Aware Multimodal Understanding
Region-Level Context-Aware Multimodal Understanding
Hongliang Wei
Xianqi Zhang
Xingtao Wang
Xiaopeng Fan
Debin Zhao
VLM
149
0
0
17 Aug 2025
Generative Medical Event Models Improve with Scale
Generative Medical Event Models Improve with Scale
Shane Waxler
Paul Blazek
Davis White
Daniel Sneider
Kevin Chung
...
Hoifung Poon
Andrew Loza
Daniella Meeker
Seth Hain
Rahul Shah
MedIm
206
0
0
16 Aug 2025
Noise Matters: Optimizing Matching Noise for Diffusion Classifiers
Noise Matters: Optimizing Matching Noise for Diffusion Classifiers
Yanghao Wang
Long Chen
DiffMVLM
260
2
0
15 Aug 2025
LoRAtorio: An intrinsic approach to LoRA Skill Composition
LoRAtorio: An intrinsic approach to LoRA Skill Composition
Niki Foteinopoulou
Ignas Budvytis
Stephan Liwicki
MoMe
125
0
0
15 Aug 2025
Are Large Pre-trained Vision Language Models Effective Construction Safety Inspectors?
Are Large Pre-trained Vision Language Models Effective Construction Safety Inspectors?
Xuezheng Chen
Zhengbo Zou
MLLM
80
0
0
14 Aug 2025
TweezeEdit: Consistent and Efficient Image Editing with Path Regularization
TweezeEdit: Consistent and Efficient Image Editing with Path Regularization
Jianda Mao
Kaibo Wang
Yang Xiang
Kani Chen
DiffM
80
1
0
14 Aug 2025
Towards Spatially Consistent Image Generation: On Incorporating Intrinsic Scene Properties into Diffusion Models
Towards Spatially Consistent Image Generation: On Incorporating Intrinsic Scene Properties into Diffusion Models
H. J. Lee
Suhyung Choi
Byoung-Tak Zhang
Inwoo Hwang
176
0
0
14 Aug 2025
Translation of Text Embedding via Delta Vector to Suppress Strongly Entangled Content in Text-to-Image Diffusion Models
Translation of Text Embedding via Delta Vector to Suppress Strongly Entangled Content in Text-to-Image Diffusion Models
Eunseo Koh
Seunghoo Hong
Tae-Young Kim
Simon S. Woo
Jae-Pil Heo
DiffM
215
0
0
14 Aug 2025
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
L. Eyring
Shyamgopal Karthik
Alexey Dosovitskiy
Nataniel Ruiz
Zeynep Akata
DiffM
159
8
0
13 Aug 2025
Images Speak Louder Than Scores: Failure Mode Escape for Enhancing Generative Quality
Images Speak Louder Than Scores: Failure Mode Escape for Enhancing Generative Quality
Jie Shao
Ke Zhu
Minghao Fu
Guo-Hua Wang
Jianxin Wu
100
0
0
13 Aug 2025
Collaborative Face Experts Fusion in Video Generation: Boosting Identity Consistency Across Large Face Poses
Collaborative Face Experts Fusion in Video Generation: Boosting Identity Consistency Across Large Face Poses
Yuji Wang
Moran Li
Xiaobin Hu
Ran Yi
Jiangning Zhang
Chengming Xu
Weijian Cao
Yabiao Wang
Chengjie Wang
Lizhuang Ma
156
0
0
13 Aug 2025
RefAdGen: High-Fidelity Advertising Image Generation
RefAdGen: High-Fidelity Advertising Image Generation
Yiyun Chen
Weikai Yang
102
0
0
12 Aug 2025
Lay2Story: Extending Diffusion Transformers for Layout-Togglable Story Generation
Lay2Story: Extending Diffusion Transformers for Layout-Togglable Story Generation
Ao Ma
Jiasong Feng
Ke Cao
Jing Wang
Yun Wang
Quanwei Zhang
Zhanjie Zhang
DiffMVGen
138
4
0
12 Aug 2025
X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning
X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning
Jian Ma
Xujie Zhu
Zihao Pan
Qirong Peng
Xu Guo
Chen Chen
H. Lu
140
4
0
11 Aug 2025
S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix
S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix
Peng Dai
Feitong Tan
Qiangeng Xu
Yihua Huang
David Futschik
Ruofei Du
S. Fanello
Yinda Zhang
Xiaojuan Qi
VGen
92
0
0
11 Aug 2025
Previous
123456...282930
Next