ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.08718
  4. Cited By
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
v1v2v3 (latest)

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
18 April 2021
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
    CLIP
ArXiv (abs)PDFHTML

Papers citing "CLIPScore: A Reference-free Evaluation Metric for Image Captioning"

50 / 1,488 papers shown
Noise Matters: Optimizing Matching Noise for Diffusion Classifiers
Noise Matters: Optimizing Matching Noise for Diffusion Classifiers
Yanghao Wang
Long Chen
DiffMVLM
284
2
0
15 Aug 2025
TweezeEdit: Consistent and Efficient Image Editing with Path Regularization
TweezeEdit: Consistent and Efficient Image Editing with Path Regularization
Jianda Mao
Kaibo Wang
Yang Xiang
Kani Chen
DiffM
104
1
0
14 Aug 2025
Are Large Pre-trained Vision Language Models Effective Construction Safety Inspectors?
Are Large Pre-trained Vision Language Models Effective Construction Safety Inspectors?
Xuezheng Chen
Zhengbo Zou
MLLM
95
0
0
14 Aug 2025
CountCluster: Training-Free Object Quantity Guidance with Cross-Attention Map Clustering for Text-to-Image Generation
CountCluster: Training-Free Object Quantity Guidance with Cross-Attention Map Clustering for Text-to-Image Generation
Joohyeon Lee
Jin-Seop Lee
Jee-Hyong Lee
109
0
0
14 Aug 2025
Translation of Text Embedding via Delta Vector to Suppress Strongly Entangled Content in Text-to-Image Diffusion Models
Translation of Text Embedding via Delta Vector to Suppress Strongly Entangled Content in Text-to-Image Diffusion Models
Eunseo Koh
Seunghoo Hong
Tae-Young Kim
Simon S. Woo
Jae-Pil Heo
DiffM
273
0
0
14 Aug 2025
Towards Spatially Consistent Image Generation: On Incorporating Intrinsic Scene Properties into Diffusion Models
Towards Spatially Consistent Image Generation: On Incorporating Intrinsic Scene Properties into Diffusion Models
H. J. Lee
Suhyung Choi
Byoung-Tak Zhang
Inwoo Hwang
189
0
0
14 Aug 2025
Images Speak Louder Than Scores: Failure Mode Escape for Enhancing Generative Quality
Images Speak Louder Than Scores: Failure Mode Escape for Enhancing Generative Quality
Jie Shao
Ke Zhu
Minghao Fu
Guo-Hua Wang
Jianxin Wu
105
0
0
13 Aug 2025
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
L. Eyring
Shyamgopal Karthik
Alexey Dosovitskiy
Nataniel Ruiz
Zeynep Akata
DiffM
194
8
0
13 Aug 2025
Collaborative Face Experts Fusion in Video Generation: Boosting Identity Consistency Across Large Face Poses
Collaborative Face Experts Fusion in Video Generation: Boosting Identity Consistency Across Large Face Poses
Yuji Wang
Moran Li
Xiaobin Hu
Ran Yi
Jiangning Zhang
Chengming Xu
Weijian Cao
Yabiao Wang
Chengjie Wang
Lizhuang Ma
238
0
0
13 Aug 2025
Lay2Story: Extending Diffusion Transformers for Layout-Togglable Story Generation
Lay2Story: Extending Diffusion Transformers for Layout-Togglable Story Generation
Ao Ma
Jiasong Feng
Ke Cao
Jing Wang
Yun Wang
Quanwei Zhang
Zhanjie Zhang
DiffMVGen
157
5
0
12 Aug 2025
RefAdGen: High-Fidelity Advertising Image Generation
RefAdGen: High-Fidelity Advertising Image Generation
Yiyun Chen
Weikai Yang
111
0
0
12 Aug 2025
S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix
S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix
Peng Dai
Feitong Tan
Qiangeng Xu
Yihua Huang
David Futschik
Ruofei Du
S. Fanello
Yinda Zhang
Xiaojuan Qi
VGen
134
0
0
11 Aug 2025
X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning
X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning
Jian Ma
Xujie Zhu
Zihao Pan
Qirong Peng
Xu Guo
Chen Chen
H. Lu
154
5
0
11 Aug 2025
Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing
Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing
Joonghyuk Shin
Alchan Hwang
Yujin Kim
Daneul Kim
Jaesik Park
DiffM
121
4
0
11 Aug 2025
MIMIC: Multimodal Inversion for Model Interpretation and Conceptualization
MIMIC: Multimodal Inversion for Model Interpretation and Conceptualization
Animesh Jain
Alexandros Stergiou
125
0
0
11 Aug 2025
HiMat: DiT-based Ultra-High Resolution SVBRDF Generation
HiMat: DiT-based Ultra-High Resolution SVBRDF Generation
Zixiong Wang
Jian Yang
Yiwei Hu
Milos Hasan
Beibei Wang
219
0
0
09 Aug 2025
Hi3DEval: Advancing 3D Generation Evaluation with Hierarchical Validity
Hi3DEval: Advancing 3D Generation Evaluation with Hierarchical Validity
Yuhan Zhang
Long Zhuo
Ziyang Chu
Tong Wu
Zhibing Li
Liang Pan
Dahua Lin
Ziwei Liu
228
0
0
07 Aug 2025
Towards Robust Evaluation of Visual Activity Recognition: Resolving Verb Ambiguity with Sense Clustering
Towards Robust Evaluation of Visual Activity Recognition: Resolving Verb Ambiguity with Sense Clustering
Louie Hong Yao
Nicholas Jarvis
Tianyu Jiang
95
0
0
07 Aug 2025
Adapting Vision-Language Models Without Labels: A Comprehensive Survey
Adapting Vision-Language Models Without Labels: A Comprehensive Survey
Hao Dong
Lijun Sheng
Jian Liang
Ran He
Eleni Chatzi
Olga Fink
OffRLVLM
216
4
0
07 Aug 2025
A Novel Image Similarity Metric for Scene Composition Structure
A Novel Image Similarity Metric for Scene Composition Structure
Md Redwanul Haque
Manzur Murshed
Manoranjan Paul
Tsz-Kwan Lee
246
0
0
07 Aug 2025
Multimodal RAG Enhanced Visual Description
Multimodal RAG Enhanced Visual Description
Amit Kumar Jaiswal
Haiming Liu
Ingo Frommholz
VLM
127
0
0
06 Aug 2025
HierarchicalPrune: Position-Aware Compression for Large-Scale Diffusion Models
HierarchicalPrune: Position-Aware Compression for Large-Scale Diffusion Models
Young D. Kwon
Rui Li
Sijia Li
Da Li
S. Bhattacharya
Stylianos I. Venieris
VLM
168
2
0
06 Aug 2025
StyleTailor: Towards Personalized Fashion Styling via Hierarchical Negative Feedback
StyleTailor: Towards Personalized Fashion Styling via Hierarchical Negative Feedback
Hongbo Ma
Fei Shen
Hongbin Xu
Xiaoce Wang
Gang Xu
Jinkai Zheng
Liangqiong Qu
Ming Li
208
0
0
06 Aug 2025
Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation
Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation
Jinxing Zhou
Yanghao Zhou
Mingfei Han
Tong Wang
Xiaojun Chang
Hisham Cholakkal
Rao Muhammad Anwer
VOSLRM
182
1
0
06 Aug 2025
Diffusion Models with Adaptive Negative Sampling Without External Resources
Diffusion Models with Adaptive Negative Sampling Without External Resources
Alakh Desai
Nuno Vasconcelos
DiffM
162
0
0
05 Aug 2025
Bias Beyond Demographics: Probing Decision Boundaries in Black-Box LVLMs via Counterfactual VQA
Bias Beyond Demographics: Probing Decision Boundaries in Black-Box LVLMs via Counterfactual VQA
Zaiying Zhao
Toshihiko Yamasaki
VLM
174
0
0
05 Aug 2025
SCFlow: Implicitly Learning Style and Content Disentanglement with Flow Models
SCFlow: Implicitly Learning Style and Content Disentanglement with Flow Models
Pingchuan Ma
Xiaopei Yang
Yusong Li
Ming Gui
Felix Krause
Johannes Schusterbauer
Bjorn Ommer
DRL
243
1
0
05 Aug 2025
ChartCap: Mitigating Hallucination of Dense Chart Captioning
ChartCap: Mitigating Hallucination of Dense Chart Captioning
Junyoung Lim
Jaewoo Ahn
Gunhee Kim
116
1
0
05 Aug 2025
VQA support to Arabic Language Learning Educational Tool
VQA support to Arabic Language Learning Educational Tool
Khaled Bachir Delassi
Lakhdar Zeggane
H. Cherroun
Abdelhamid Haouhat
Kaoutar Bouzouad
AI4Ed
170
0
0
05 Aug 2025
StrandDesigner: Towards Practical Strand Generation with Sketch Guidance
StrandDesigner: Towards Practical Strand Generation with Sketch Guidance
Na Zhang
Moran Li
Chengming Xu
Han Feng
Xiaobin Hu
Jiangning Zhang
Weijian Cao
Chengjie Wang
Yanwei Fu
DiffM
84
0
0
03 Aug 2025
Personalized Safety Alignment for Text-to-Image Diffusion Models
Personalized Safety Alignment for Text-to-Image Diffusion Models
Yu Lei
Jinbin Bai
Qingyu Shi
Aosong Feng
Kaidong Yu
EGVM
195
0
0
02 Aug 2025
Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models
Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models
Hyundong Jin
Hyung Jin Chang
Eunwoo Kim
VLM
135
0
0
01 Aug 2025
Sel3DCraft: Interactive Visual Prompts for User-Friendly Text-to-3D Generation
Sel3DCraft: Interactive Visual Prompts for User-Friendly Text-to-3D Generation
Nan Xiang
Tianyi Liang
Jia Bu
Shiqi Jiang
Hao Huang
Yifei Huang
Liangyu Chen
Changbo Wang
Chenhui Li
DiffM
189
1
0
01 Aug 2025
Adversarial-Guided Diffusion for Multimodal LLM Attacks
Adversarial-Guided Diffusion for Multimodal LLM Attacks
Chengwei Xia
Fan Ma
Ruijie Quan
Kun Zhan
Yi Yang
DiffM
192
1
0
31 Jul 2025
Investigating the Invertibility of Multimodal Latent Spaces: Limitations of Optimization-Based Methods
Investigating the Invertibility of Multimodal Latent Spaces: Limitations of Optimization-Based Methods
Siwoo Park
100
0
0
30 Jul 2025
MultiEditor: Controllable Multimodal Object Editing for Driving Scenarios Using 3D Gaussian Splatting Priors
MultiEditor: Controllable Multimodal Object Editing for Driving Scenarios Using 3D Gaussian Splatting Priors
Shouyi Lu
Zihan Lin
Chao Lu
Huanran Wang
Guirong Zhuo
Lianqing Zheng
DiffM
254
0
0
29 Jul 2025
Trade-offs in Image Generation: How Do Different Dimensions Interact?
Trade-offs in Image Generation: How Do Different Dimensions Interact?
Sicheng Zhang
Binzhu Xie
Zhonghao Yan
Yuli Zhang
Donghao Zhou
Xiaofei Chen
Shi Qiu
Jiaqi Liu
Guoyang Xie
Zhichao Lu
160
2
0
29 Jul 2025
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
HunyuanWorld Team
Zhenwei Wang
Yuhao Liu
Junta Wu
Zixiao Gu
...
Y. Liu
Linus
Jie Jiang
Tengfei Wang
Chunchao Guo
VGen
265
0
0
29 Jul 2025
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
Shijie Zhou
Ruiyi Zhang
Huaisheng Zhu
Branislav Kveton
Jiuxiang Gu
J. Gu
Jian Chen
Changyou Chen
MLLMVLMLRM
368
6
0
28 Jul 2025
T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation
T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation
Chieh-Yun Chen
Min Shi
Gong Zhang
Humphrey Shi
MLLM
293
3
0
28 Jul 2025
Model-Agnostic Gender Bias Control for Text-to-Image Generation via Sparse Autoencoder
Model-Agnostic Gender Bias Control for Text-to-Image Generation via Sparse Autoencoder
Chao Wu
Zhenyi Wang
Kangxian Xie
Naresh Kumar Devulapally
Vishnu Suresh Lokhande
Mingchen Gao
193
0
0
28 Jul 2025
On Explaining Visual Captioning with Hybrid Markov Logic Networks
On Explaining Visual Captioning with Hybrid Markov Logic Networks
Monika Shah
Somdeb Sarkhel
Deepak Venugopal
VLM
171
0
0
28 Jul 2025
A Survey on Generative Model Unlearning: Fundamentals, Taxonomy, Evaluation, and Future Direction
A Survey on Generative Model Unlearning: Fundamentals, Taxonomy, Evaluation, and Future Direction
Xiaohua Feng
Jiaming Zhang
Fengyuan Yu
C. Wang
Li Zhang
Kaixiang Li
Yuyuan Li
Chaochao Chen
Jianwei Yin
MU
262
2
0
26 Jul 2025
LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences
LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences
Yusuke Hirota
Boyi Li
Ryo Hachiuma
Yueh-Hua Wu
Boris Ivanovic
Yuta Nakashima
Marco Pavone
Yejin Choi
Yu-Chun Wang
Chao-Han Huck Yang
VLM
199
1
0
25 Jul 2025
A Survey of Multimodal Hallucination Evaluation and Detection
A Survey of Multimodal Hallucination Evaluation and Detection
Zhiyuan Chen
Yuecong Min
Jie M. Zhang
Bei Yan
Jiahao Wang
X. Wang
Shiguang Shan
HILM
344
4
0
25 Jul 2025
SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning
SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning
Si-Woo Kim
MinJu Jeon
Ye-Chan Kim
Soeun Lee
Taewhan Kim
Dong-Jin Kim
177
3
0
24 Jul 2025
T2VWorldBench: A Benchmark for Evaluating World Knowledge in Text-to-Video Generation
T2VWorldBench: A Benchmark for Evaluating World Knowledge in Text-to-Video Generation
Yubin Chen
Xuyang Guo
Zhenmei Shi
Zhao Song
Jiahao Zhang
VGen
674
9
0
24 Jul 2025
TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation
TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation
Zhekai Chen
Ruihang Chu
Yukang Chen
Shiwei Zhang
Yujie Wei
Yingya Zhang
Xihui Liu
257
8
0
24 Jul 2025
COT-AD: Cotton Analysis Dataset
COT-AD: Cotton Analysis DatasetInternational Conference on Information Photonics (ICIP), 2025
Akbar Ali
Mahek Vyas
Soumyaratna Debnath
Chanda Grover Kamra
Jaidev Sanjay Khalane
Reuben Shibu Devanesan
Indra Deep Mastan
Subramanian Sankaranarayanan
Pankaj Khanna
Shanmuganathan Raman
115
0
0
24 Jul 2025
HarmonPaint: Harmonized Training-Free Diffusion Inpainting
HarmonPaint: Harmonized Training-Free Diffusion Inpainting
Ying Li
Xinzhe Li
Yong Du
Yangyang Xu
Junyu Dong
Shengfeng He
DiffM
169
0
0
22 Jul 2025
Previous
123...567...282930
Next