Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2104.08718
Cited By
v1
v2
v3 (latest)
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
18 April 2021
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CLIPScore: A Reference-free Evaluation Metric for Image Captioning"
50 / 1,488 papers shown
Reinforcement Learning for Large Model: A Survey
Weijia Wu
Chen Gao
Joya Chen
Kevin Lin
Qingwei Meng
Yiming Zhang
Yuke Qiu
Hong Zhou
Mike Zheng Shou
316
2
0
24 Dec 2025
ShadowDraw: From Any Object to Shadow-Drawing Compositional Art
Rundong Luo
Noah Snavely
Wei-Chiu Ma
DiffM
125
0
0
04 Dec 2025
Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models
NaHyeon Park
Namin An
Kunhee Kim
Soyeon Yoon
Jiahao Huo
Hyunjung Shim
VLM
95
0
0
04 Dec 2025
Refaçade: Editing Object with Given Reference Texture
Youze Huang
Penghui Ruan
Bojia Zi
Xianbiao Qi
Jianan Wang
Rong Xiao
DiffM
169
0
0
04 Dec 2025
Value Gradient Guidance for Flow Matching Alignment
Zhen Liu
Tim Z. Xiao
Carles Domingo-Enrich
Weiyang Liu
Dinghuai Zhang
54
0
0
04 Dec 2025
I2I-Bench: A Comprehensive Benchmark Suite for Image-to-Image Editing Models
Juntong Wang
Jiarui Wang
Huiyu Duan
Jiaxiang Kang
Guangtao Zhai
Xiongkuo Min
VLM
169
0
0
04 Dec 2025
Highly Efficient Test-Time Scaling for T2I Diffusion Models with Text Embedding Perturbation
Hang Xu
Linjiang Huang
Feng Zhao
DiffM
117
0
0
03 Dec 2025
GeoVideo: Introducing Geometric Regularization into Video Generation Model
Yunpeng Bai
Shaoheng Fang
Chaohui Yu
Fan Wang
Qixing Huang
DiffM
VGen
MDE
452
2
0
03 Dec 2025
Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
J. Li
Bin Li
Jiahao Li
Yan Lu
112
0
0
03 Dec 2025
Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping
Joan Nwatu
Longju Bai
Oana Ignat
Rada Mihalcea
71
0
0
02 Dec 2025
IC-World: In-Context Generation for Shared World Modeling
Fan Wu
Jiacheng Wei
Ruibo Li
Yi Tian Xu
Junyou Li
Deheng Ye
Guosheng Lin
VGen
64
0
0
01 Dec 2025
Generative Action Tell-Tales: Assessing Human Motion in Synthesized Videos
Xavier Thomas
Youngsun Lim
Ananya Srinivasan
Audrey Zheng
Deepti Ghadiyaram
EGVM
VGen
316
0
0
01 Dec 2025
FineGRAIN: Evaluating Failure Modes of Text-to-Image Models with Vision Language Model Judges
Kevin David Hayes
Micah Goldblum
Vikash Sehwag
Gowthami Somepalli
Ashwinee Panda
Tom Goldstein
MLLM
EGVM
240
0
0
01 Dec 2025
BioPro: On Difference-Aware Gender Fairness for Vision-Language Models
Y. Lin
Jiayao Ma
Qingguo Hu
Derek F. Wong
Jinsong Su
64
0
0
30 Nov 2025
Minimal neuron ablation triggers catastrophic collapse in the language core of Large Vision-Language Models
Cen Lu
Yung-Chen Tang
Andrea Cavallaro
53
0
0
30 Nov 2025
Multilingual Training-Free Remote Sensing Image Captioning
Carlos Rebelo
Gil Rocha
João Daniel Silva
Bruno Martins
106
0
0
30 Nov 2025
SplatFont3D: Structure-Aware Text-to-3D Artistic Font Generation with Part-Level Style Control
Ji Gan
Lingxu Chen
Jiaxu Leng
Xinbo Gao
3DGS
190
0
0
29 Nov 2025
FR-TTS: Test-Time Scaling for NTP-based Image Generation with Effective Filling-based Reward Signal
Hang Xu
Linjiang Huang
Feng Zhao
102
0
0
29 Nov 2025
Vision Bridge Transformer at Scale
Zhenxiong Tan
Zeqing Wang
Xingyi Yang
Songhua Liu
Xinchao Wang
DiffM
100
0
0
28 Nov 2025
GOATex: Geometry & Occlusion-Aware Texturing
Hyunjin Kim
Kunho Kim
Adam Lee
Wonkwang Lee
DiffM
98
0
0
28 Nov 2025
InstanceV: Instance-Level Video Generation
Yuheng Chen
Teng Hu
Jiangning Zhang
Zhucun Xue
Ran Yi
Lizhuang Ma
DiffM
VGen
120
0
0
28 Nov 2025
Guiding Visual Autoregressive Models through Spectrum Weakening
Chaoyang Wang
Tianmeng Yang
Jingdong Wang
Yunhai Tong
DiffM
168
0
0
28 Nov 2025
From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images
Yiming Chen
Junlin Han
Tianyi Bai
Shengbang Tong
Filippos Kokkinos
Philip Torr
87
0
0
27 Nov 2025
Flow Density Control: Generative Optimization Beyond Entropy-Regularized Fine-Tuning
Riccardo De Santi
Marin Vlastelica
Ya-Ping Hsieh
Zebang Shen
Niao He
Andreas Krause
AI4CE
78
0
0
27 Nov 2025
AnchorFlow: Training-Free 3D Editing via Latent Anchor-Aligned Flows
Zhenglin Zhou
Fan Ma
Chengzhuo Gui
Xiaobo Xia
Hehe Fan
Yi Yang
Tat-Seng Chua
157
0
0
27 Nov 2025
CameraMaster: Unified Camera Semantic-Parameter Control for Photography Retouching
Qirui Yang
Yang Yang
Ying Zeng
Xiaobin Hu
Bo Li
Huanjing Yue
Jingyu Yang
P. Jiang
DiffM
VGen
311
0
0
26 Nov 2025
Progress by Pieces: Test-Time Scaling for Autoregressive Image Generation
Joonhyung Park
Hyeongwon Jang
Joowon Kim
Eunho Yang
VLM
153
0
0
26 Nov 2025
CAHS-Attack: CLIP-Aware Heuristic Search Attack Method for Stable Diffusion
Shuhan Xia
Jing Dai
Hui Ouyang
Yadong Shang
Dongxiao Zhao
Peipei Li
DiffM
AAML
457
0
0
26 Nov 2025
Training-Free Diffusion Priors for Text-to-Image Generation via Optimization-based Visual Inversion
Samuele DellÉrba
Andrew D. Bagdanov
176
0
0
25 Nov 2025
CREward: A Type-Specific Creativity Reward Model
Jiyeon Han
Ali Mahdavi-Amiri
Hao Zhang
Haedong Jeong
105
0
0
25 Nov 2025
EmoFeedback
2
^2
2
: Reinforcement of Continuous Emotional Image Generation via LVLM-based Reward and Textual Feedback
Jingyang Jia
Kai Shu
Gang Yang
Long Xing
Xun Chen
Aiping Liu
EGVM
395
1
0
25 Nov 2025
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward
Yuwei Niu
Weiyang Jin
Jiaqi Liao
Chaoran Feng
Peng Jin
Bin Lin
Zongjian Li
Bin Zhu
Weihao Yu
Li Yuan
SyDa
LRM
456
0
0
25 Nov 2025
Text-guided Controllable Diffusion for Realistic Camouflage Images Generation
Yuhang Qian
Haiyan Chen
Wentong Li
Ningzhong Liu
Jie Qin
DiffM
198
1
0
25 Nov 2025
Beyond Reward Margin: Rethinking and Resolving Likelihood Displacement in Diffusion Models via Video Generation
Ruojun Xu
Yu Kai
Xuhua Ren
Jiaxiang Cheng
Bing Ma
Tianxiang Zheng
Qinhlin Lu
EGVM
159
0
0
24 Nov 2025
Now You See It, Now You Don't - Instant Concept Erasure for Safe Text-to-Image and Video Generation
Shristi Das Biswas
Arani Roy
Kaushik Roy
VGen
263
0
0
24 Nov 2025
Towards Robust and Fair Next Visit Diagnosis Prediction under Noisy Clinical Notes with Large Language Models
Heejoon Koo
121
0
0
23 Nov 2025
ConsistCompose: Unified Multimodal Layout Control for Image Composition
Xuanke Shi
B. Li
Xiaoyang Han
Zhongang Cai
Lei Yang
Dahua Lin
Quan-ding Wang
MLLM
385
0
0
23 Nov 2025
Synthetic Curriculum Reinforces Compositional Text-to-Image Generation
Shijian Wang
Runhao Fu
Siyi Zhao
Qingqin Zhan
Xingjian Wang
Jiarui Jin
Yuan Lu
Hanqian Wu
Cunjian Chen
EGVM
226
0
0
23 Nov 2025
MagicWand: A Universal Agent for Generation and Evaluation Aligned with User Preference
Zitong Xu
Dake Shen
Yaosong Du
Kexiang Hao
Jinghan Huang
Xiande Huang
75
0
0
23 Nov 2025
Beyond Words and Pixels: A Benchmark for Implicit World Knowledge Reasoning in Generative Models
Tianyang Han
Junhao Su
J. Hu
Peizhen Yang
Hengyu Shi
Junfeng Luo
Jialin Gao
EGVM
VGen
480
0
0
23 Nov 2025
Refracting Reality: Generating Images with Realistic Transparent Objects
Yue Yin
Enze Tao
Dylan Campbell
DiffM
166
0
0
21 Nov 2025
Counterfactual World Models via Digital Twin-conditioned Video Diffusion
Yiqing Shen
Aiza Maksutova
Chenjia Li
Mathias Unberath
DiffM
VGen
165
0
0
21 Nov 2025
RoomPlanner: Explicit Layout Planner for Easier LLM-Driven 3D Room Generation
Wenzhuo Sun
Mingjian Liang
Wenxuan Song
Xuelian Cheng
Zongyuan Ge
3DV
222
0
0
21 Nov 2025
Personalized Reward Modeling for Text-to-Image Generation
Jeongeun Lee
Ryang Heo
Dongha Lee
EGVM
153
0
0
21 Nov 2025
Diversity Has Always Been There in Your Visual Autoregressive Models
Tong Wang
Guanyu Yang
Nian Liu
Kai Wang
Yaxing Wang
Abdelrahman M. Shaker
Salman Khan
Fahad Shahbaz Khan
S. Li
136
0
0
21 Nov 2025
Where Culture Fades: Revealing the Cultural Gap in Text-to-Image Generation
Chuancheng Shi
Shangze Li
Shiming Guo
Simiao Xie
Wenhua Wu
...
Canran Xiao
Cong Wang
Zifeng Cheng
Fei Shen
Tat-Seng Chua
VLM
225
0
0
21 Nov 2025
Physics-Based Benchmarking Metrics for Multimodal Synthetic Images
Kishor Datta Gupta
Marufa Kamal
M. Rahman
Fahad Rahman
Mohd Ariful Haque
Sunzida Siddique
131
0
0
19 Nov 2025
Insert In Style: A Zero-Shot Generative Framework for Harmonious Cross-Domain Object Composition
Raghu Chittersu
Yuvraj Singh Rathore
Pranav Adlinge
Kunal Swami
DiffM
260
0
0
19 Nov 2025
Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning
Yuxuan Gu
Weimin Bai
Yifei Wang
Weijian Luo
H. Sun
DiffM
OffRL
246
0
0
19 Nov 2025
Distribution Matching Distillation Meets Reinforcement Learning
Dengyang Jiang
Dongyang Liu
Zanyi Wang
Qilong Wu
Liuzhuozheng Li
...
Bo Zhang
Mengmeng Wang
Steven Hoi
Peng Gao
H. Yang
402
0
0
17 Nov 2025
1
2
3
4
...
28
29
30
Next