Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2104.08718
Cited By
v1
v2
v3 (latest)
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
18 April 2021
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CLIPScore: A Reference-free Evaluation Metric for Image Captioning"
50 / 1,488 papers shown
PEO: Training-Free Aesthetic Quality Enhancement in Pre-Trained Text-to-Image Diffusion Models with Prompt Embedding Optimization
Hovhannes Margaryan
Bo Wan
Tinne Tuytelaars
280
0
0
02 Oct 2025
Learn to Guide Your Diffusion Model
Alexandre Galashov
Ashwini Pokle
Arnaud Doucet
Arthur Gretton
Mauricio Delbracio
Valentin De Bortoli
DiffM
437
0
0
01 Oct 2025
Multi-Objective Task-Aware Predictor for Image-Text Alignment
Eunki Kim
Na Min An
James Thorne
Hyunjung Shim
133
0
0
01 Oct 2025
Data Selection for Fine-tuning Vision Language Models via Cross Modal Alignment Trajectories
Nilay Naharas
Dang Nguyen
Nesihan Bulut
M. Bateni
Vahab Mirrokni
Baharan Mirzasoleiman
100
0
0
01 Oct 2025
ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning
Yuxiang Guo
Jiang Liu
Ze Wang
Hao Chen
Ximeng Sun
Yang Zhao
Jialian Wu
Xiaodong Yu
Zicheng Liu
Emad Barsoum
LM&MA
134
0
0
01 Oct 2025
VIRTUE: Visual-Interactive Text-Image Universal Embedder
Wei-Yao Wang
Kazuya Tateishi
Qiyu Wu
Shusuke Takahashi
Yuki Mitsufuji
VLM
143
0
0
01 Oct 2025
FinCap: Topic-Aligned Captions for Short-Form Financial YouTube Videos
Siddhant Sukhani
Yash Bhardwaj
Riya Bhadani
Veer Kejriwal
Michael Galarnyk
Sudheer Chava
80
0
0
30 Sep 2025
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
Keming Wu
Sicong Jiang
Max Ku
Ping Nie
Minghao Liu
Wenhu Chen
116
9
0
30 Sep 2025
PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models
J. Lee
Jong Chul Ye
104
0
0
30 Sep 2025
VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions
Kazuki Matsuda
Yuiga Wada
Shinnosuke Hirano
Seitaro Otsuki
Komei Sugiura
VLM
152
1
0
30 Sep 2025
Post-Training Quantization via Residual Truncation and Zero Suppression for Diffusion Models
Donghoon Kim
Dongyoung Lee
Ik Joon Chang
Sung-Ho Bae
MQ
144
0
0
30 Sep 2025
Fidelity-Aware Data Composition for Robust Robot Generalization
Zizhao Tong
Di Chen
Sicheng Hu
Hongwei Fan
Liliang Chen
Maoqing Yao
Hao Tang
Hao Dong
Ling Shao
136
1
0
29 Sep 2025
TraitSpaces: Towards Interpretable Visual Creativity for Human-AI Co-Creation
Prerna Luthra
20
0
0
29 Sep 2025
GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models
Peter Holderrieth
Uriel Singer
Tommi Jaakkola
Ricky T. Q. Chen
Y. Lipman
Brian Karrer
DiffM
172
0
0
29 Sep 2025
Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models
Bowei Chen
Sai Bi
Hao Tan
Chentao Song
Tianyuan Zhang
Zhengqi Li
Yuanjun Xiong
Jianming Zhang
Kai Zhang
210
4
0
29 Sep 2025
When Scores Learn Geometry: Rate Separations under the Manifold Hypothesis
Xiang Li
Zebang Shen
Ya-Ping Hsieh
Niao He
DiffM
1.4K
0
0
29 Sep 2025
M3DLayout: A Multi-Source Dataset of 3D Indoor Layouts and Structured Descriptions for 3D Generation
Yiheng Zhang
Zhuojiang Cai
Mingdao Wang
Meitong Guo
Tianxiao Li
Li Lin
Yuwang Wang
3DV
164
0
0
28 Sep 2025
Diff-3DCap: Shape Captioning with Diffusion Models
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2025
Zhenyu Shu
Jiawei Wen
Shiyang Li
Shiqing Xin
Ligang Liu
DiffM
123
0
0
28 Sep 2025
RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks
Amit Agarwal
Hitesh Laxmichand Patel
Srikant Panda
Hansa Meghwani
Jyotika Singh
Karan Dua
Paul Li
Tao Sheng
Sujith Ravi
Dan Roth
LRM
130
3
0
28 Sep 2025
Towards Fine-Grained Text-to-3D Quality Assessment: A Benchmark and A Two-Stage Rank-Learning Metric
Bingyang Cui
Yujie Zhang
Qi Yang
Zhu Li
Yiling Xu
229
0
0
28 Sep 2025
Enhancing Blind Face Restoration through Online Reinforcement Learning
Bin Wu
Yahui Liu
Chi Zhang
Yao-Min Zhao
Wei Wang
CVBM
OffRL
CLL
OnRL
424
0
0
27 Sep 2025
No Concept Left Behind: Test-Time Optimization for Compositional Text-to-Image Generation
Mohammad Hossein Sameti
Amir M. Mansourian
Arash Marioriyad
Soheil Fadaee Oshyani
M. Rohban
M. Baghshah
93
0
0
27 Sep 2025
Follow-Your-Preference: Towards Preference-Aligned Image Inpainting
Yutao Shen
Junkun Yuan
Toru Aonishi
Hideki Nakayama
Yue Ma
EGVM
180
3
0
27 Sep 2025
CREPE: Controlling Diffusion with Replica Exchange
Jiajun He
Paul Jeha
Peter Potaptchik
Leo Zhang
José Miguel Hernández-Lobato
Yuanqi Du
Saifuddin Syed
Francisco Vargas
DiffM
97
0
0
27 Sep 2025
MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models
Jonas Belouadi
T. Boubekeur
Adrien Kaiser
102
0
0
26 Sep 2025
Guidance Watermarking for Diffusion Models
Enoal Gesny
Eva Giboulot
Teddy Furon
Vivien Chappelier
WIGM
224
1
0
26 Sep 2025
Memory Self-Regeneration: Uncovering Hidden Knowledge in Unlearned Models
Agnieszka Polowczyk
Alicja Polowczyk
Joanna Waczyñska
Piotr Borycki
Przemysław Spurek
160
0
0
26 Sep 2025
HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models
Seyedmorteza Sadat
Farnood Salehi
Romann M. Weber
DiffM
160
0
0
26 Sep 2025
Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks
Miao Jing
Mengting Jia
Junling Lin
Zhongxia Shen
Lijun Wang
Yuanyuan Peng
Huan Gao
VLM
ELM
LRM
312
0
0
26 Sep 2025
Drag4D: Align Your Motion with Text-Driven 3D Scene Generation
Minjun Kang
Inkyu Shin
Taeyeop Lee
In So Kweon
KuK-Jin Yoon
117
0
0
26 Sep 2025
FailureAtlas:Mapping the Failure Landscape of T2I Models via Active Exploration
Muxi Chen
Zhaohua Zhang
Chenchen Zhao
Mingyang Chen
Wenyu Jiang
...
Jianhuan Zhuo
Yu Tang
Qiuyong Xiao
Jihong Zhang
Qiang Xu
94
1
0
26 Sep 2025
LLMs Behind the Scenes: Enabling Narrative Scene Illustration
Melissa Roemmele
John Joon Young Chung
Taewook Kim
Yuqian Sun
Alex Calderwood
Max Kreminski
DiffM
124
1
0
26 Sep 2025
Rethinking Inter-LoRA Orthogonality in Adapter Merging: Insights from Orthogonal Monte Carlo Dropout
Andi Zhang
Xuan Ding
Haofan Wang
Steven McDonagh
Samuel Kaski
MoMe
181
0
0
26 Sep 2025
UniMIC: Token-Based Multimodal Interactive Coding for Human-AI Collaboration
Qi Mao
Tinghan Yang
Jiahao Li
Bin Li
Libiao Jin
Yan Lu
152
0
0
26 Sep 2025
TDEdit: A Unified Diffusion Framework for Text-Drag Guided Image Manipulation
Qihang Wang
Yaxiong Wang
Lechao Cheng
Zhun Zhong
DiffM
100
0
0
26 Sep 2025
Un-Doubling Diffusion: LLM-guided Disambiguation of Homonym Duplication
Evgeny Kaskov
Elizaveta Petrova
Petr Surovtsev
Anna Kostikova
Ilya Mistiurin
A. Kapitanov
Alexander Nagaev
DiffM
333
0
0
25 Sep 2025
Evaluating the Evaluators: Metrics for Compositional Text-to-Image Generation
S. Kasaei
Ali Aghayari
Arash Marioriyad
Niki Sepasian
MohammadAmin Fazli
Mahdieh Soleymani Baghshah
M. Rohban
EGVM
247
0
0
25 Sep 2025
VLCE: A Knowledge-Enhanced Framework for Image Description in Disaster Assessment
M. Rahman
Kishor Datta Gupta
Marufa Kamal
Fahad Rahman
Sunzida Siddique
Ahmed Rafi Hasan
Mohd Ariful Haque
Roy George
215
0
0
25 Sep 2025
MMPlanner: Zero-Shot Multimodal Procedural Planning with Chain-of-Thought Object State Reasoning
Afrina Tabassum
Bin Guo
Xiyao Ma
Hoda Eldardiry
Ismini Lourentzou
LM&Ro
LRM
104
0
0
25 Sep 2025
Towards Multimodal Active Learning: Efficient Learning with Limited Paired Data
Jiancheng Zhang
Yinglun Zhu
180
1
0
25 Sep 2025
Seeing Through Words, Speaking Through Pixels: Deep Representational Alignment Between Vision and Language Models
Zoe Wanying He
Sean Trott
Meenakshi Khosla
VLM
124
1
0
25 Sep 2025
A Single Neuron Works: Precise Concept Erasure in Text-to-Image Diffusion Models
Qinqin He
Jiaqi Weng
Jialing Tao
Hui Xue
96
1
0
25 Sep 2025
A Unified Framework for Diffusion Model Unlearning with f-Divergence
Nicola Novello
Federico Fontana
Luigi Cinque
Deniz Gunduz
Andrea M. Tonello
226
0
0
25 Sep 2025
VC-Agent: An Interactive Agent for Customized Video Dataset Collection
Yidan Zhang
Mutian Xu
Yiming Hao
Kun Zhou
Jiahao Chang
Xiaoqiang Liu
Pengfei Wan
Hongbo Fu
Xiaoguang Han
VGen
172
0
0
25 Sep 2025
ConViS-Bench: Estimating Video Similarity Through Semantic Concepts
Benedetta Liberatori
Alessandro Conti
Lorenzo Vaquero
Yiming Wang
Elisa Ricci
Paolo Rota
124
1
0
23 Sep 2025
CARINOX: Inference-time Scaling with Category-Aware Reward-based Initial Noise Optimization and Exploration
S. Kasaei
Ali Aghayari
Arash Marioriyad
Niki Sepasian
Shayan Baghayi Nejad
MohammadAmin Fazli
M. Baghshah
M. Rohban
DiffM
EGVM
240
0
0
22 Sep 2025
Seg4Diff: Unveiling Open-Vocabulary Segmentation in Text-to-Image Diffusion Transformers
Chaehyun Kim
Heeseong Shin
Eunbeen Hong
Heeji Yoon
Anurag Arnab
Paul Hongsuck Seo
Sunghwan Hong
Seungryong Kim
184
6
0
22 Sep 2025
VCE: Safe Autoregressive Image Generation via Visual Contrast Exploitation
Feng Han
Chao Gong
Zhipeng Wei
Yue Yu
Yu Jiang
DiffM
170
0
0
21 Sep 2025
VidCLearn: A Continual Learning Approach for Text-to-Video Generation
Luca Zanchetta
Lorenzo Papa
Luca Maiano
Irene Amerini
DiffM
VGen
120
0
0
21 Sep 2025
M
3
V
I
R
\mathtt{M^3VIR}
M
3
VIR
: A Large-Scale Multi-Modality Multi-View Synthesized Benchmark Dataset for Image Restoration and Content Creation
Y. Li
Lebin Zhou
Nam Ling
Zhenghao Chen
Wei Wang
Wei Jiang
VGen
165
0
0
21 Sep 2025
Previous
1
2
3
4
5
...
28
29
30
Next