Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2104.08718
Cited By
v1
v2
v3 (latest)
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
18 April 2021
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CLIPScore: A Reference-free Evaluation Metric for Image Captioning"
50 / 1,489 papers shown
Cost Savings from Automatic Quality Assessment of Generated Images
Xavier Giró-i-Nieto
Nefeli Andreou
Anqi Liang
Manel Baradad
Francesc Moreno-Noguer
Aleix M. Martinez
256
0
0
17 Oct 2025
BLIP3o-NEXT: Next Frontier of Native Image Generation
Jiuhai Chen
Le Xue
Zhiyang Xu
Xichen Pan
Shusheng Yang
...
Tianyi Zhou
Junnan Li
Silvio Savarese
Caiming Xiong
Ran Xu
113
13
0
17 Oct 2025
Adapting Self-Supervised Representations as a Latent Space for Efficient Generation
Ming Gui
Johannes Schusterbauer
Timy Phan
Felix Krause
J. Susskind
Miguel Angel Bautista
Bjorn Ommer
201
1
0
16 Oct 2025
Consistent text-to-image generation via scene de-contextualization
Song Tang
Peihao Gong
Kunyu Li
Kai Guo
Boyu Wang
Mao Ye
Jianwei Zhang
X. Zhu
DiffM
126
0
0
16 Oct 2025
LoRAverse: A Submodular Framework to Retrieve Diverse Adapters for Diffusion Models
Mert Sonmezer
Matthew Zheng
Pinar Yanardag
DiffM
MoMe
339
1
0
16 Oct 2025
DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation
Yu Zhou
Sohyun An
Haikang Deng
Da Yin
Clark Peng
Cho-Jui Hsieh
Kai-Wei Chang
Nanyun Peng
VLM
147
1
0
16 Oct 2025
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Junliang Ye
Shenghao Xie
R. Zhao
Zhengyi Wang
Hongyu Yan
Wenqiang Zu
Lei Ma
Jun Zhu
DiffM
203
4
0
16 Oct 2025
VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator
Hyojun Go
Dominik Narnhofer
Goutam Bhat
Prune Truong
Federico Tombari
Konrad Schindler
VGen
171
2
0
15 Oct 2025
Efficient Few-Shot Learning in Remote Sensing: Fusing Vision and Vision-Language Models
Jia Yun Chua
Argyrios Zolotas
Miguel Arana-Catania
80
0
0
15 Oct 2025
Counting Hallucinations in Diffusion Models
Shuai Fu
Jian Zhou
Qi Chen
Huang Jing
Huy Anh Nguyen
Xiaohan Liu
Zhixiong Zeng
Lin Ma
Quanshi Zhang
Qi Wu
DiffM
HILM
295
0
0
15 Oct 2025
FlashWorld: High-quality 3D Scene Generation within Seconds
Xinyang Li
Tengfei Wang
Zixiao Gu
Shengchuan Zhang
Chunchao Guo
Liujuan Cao
3DGS
161
5
0
15 Oct 2025
Unifying Vision-Language Latents for Zero-label Image Caption Enhancement
Sanghyun Byun
Jung Guack
Mohanad Odema
Baisub Lee
Jacob Song
Woo Seong Chung
VLM
98
0
0
14 Oct 2025
VIDMP3: Video Editing by Representing Motion with Pose and Position Priors
Sandeep Mishra
Oindrila Saha
A. Bovik
DiffM
VGen
130
0
0
14 Oct 2025
Template-Based Text-to-Image Alignment for Language Accessibility: A Study on Visualizing Text Simplifications
Belkiss Souayed
Sarah Ebling
Yingqiang Gao
96
0
0
13 Oct 2025
Evaluating Open-Source Vision-Language Models for Multimodal Sarcasm Detection
Saroj Basnet
Shafkat Farabi
Tharindu Ranasinghe
Diptesh Kanoji
Marcos Zampieri
89
0
0
13 Oct 2025
COCO-Tree: Compositional Hierarchical Concept Trees for Enhanced Reasoning in Vision Language Models
Sanchit Sinha
Guangzhi Xiong
Aidong Zhang
CoGe
LRM
VLM
184
0
0
13 Oct 2025
InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models
Haomin Wang
Jinhui Yin
Qi Wei
Wenguang Zeng
Lixin Gu
...
Yanwen Guo
Wenhai Wang
Kai Chen
Yu Qiao
Hongjie Zhang
VLM
190
2
0
13 Oct 2025
OmniQuality-R: Advancing Reward Models Through All-Encompassing Quality Assessment
Yiting Lu
Fengbin Guan
Yixin Gao
Yan Zhong
Xinge Peng
...
Y. Liu
Bo Zhang
Xin Li
Zhibo Chen
Weisi Lin
OffRL
140
0
0
12 Oct 2025
Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey
Jinxuan Li
Chaolei Tan
Haoxuan Chen
Jianxin Ma
Jian-Fang Hu
Wei-Shi Zheng
Jianhuang Lai
VLM
151
1
0
12 Oct 2025
CoIDO: Efficient Data Selection for Visual Instruction Tuning via Coupled Importance-Diversity Optimization
Yichen Yan
Ming Zhong
Qi Zhu
Xiaoling Gu
Jinpeng Chen
Huan Li
129
0
0
11 Oct 2025
Few-shot multi-token DreamBooth with LoRa for style-consistent character generation
Ruben Pascual
Mikel Sesma-Sara
A. Jurio
D. Paternain
M. Galar
DiffM
VGen
104
0
0
10 Oct 2025
PhyDAE: Physics-Guided Degradation-Adaptive Experts for All-in-One Remote Sensing Image Restoration
Zhe Dong
Yuzhe Sun
Haochen Jiang
Tianzhu Liu
Yanfeng Gu
101
1
0
09 Oct 2025
FreqCa: Accelerating Diffusion Models via Frequency-Aware Caching
Jiacheng Liu
Peiliang Cai
Qinming Zhou
Yuqi Lin
Deyang Kong
...
Haowen Xu
Chang Zou
J. Tang
S. Zheng
Linfeng Zhang
105
1
0
09 Oct 2025
One Stone with Two Birds: A Null-Text-Null Frequency-Aware Diffusion Models for Text-Guided Image Inpainting
Haipeng Liu
Yang Wang
M. Y. Wang
DiffM
520
4
0
09 Oct 2025
Beyond Textual CoT: Interleaved Text-Image Chains with Deep Confidence Reasoning for Image Editing
Zhentao Zou
Zhengrong Yue
Kunpeng Du
Binlei Bao
Hanting Li
...
Yue Zhou
Yali Wang
Jie Hu
Xue Jiang
X. Chen
LRM
183
0
0
09 Oct 2025
PickStyle: Video-to-Video Style Transfer with Context-Style Adapters
Soroush Mehraban
Vida Adeli
Jacob Rommann
Babak Taati
Kyryl Truskovskyi
DiffM
VGen
94
0
0
08 Oct 2025
VUGEN: Visual Understanding priors for GENeration
Xiangyi Chen
Théophane Vallaeys
Maha Elbayad
John Nguyen
Jakob Verbeek
VLM
140
0
0
08 Oct 2025
GenPilot: A Multi-Agent System for Test-Time Prompt Optimization in Image Generation
Wen Ye
Zhaocheng Liu
Yuwei Gui
Tingyu Yuan
Yunyue Su
Bowen Fang
Chaoyang Zhao
Qiang Liu
Liang Wang
LLMAG
88
0
0
08 Oct 2025
OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot
Junhan Zhu
Hesong Wang
Mingluo Su
Zefang Wang
Huan Wang
DiffM
VLM
289
1
0
08 Oct 2025
Toward Reliable Clinical Coding with Language Models: Verification and Lightweight Adaptation
Zhangdie Yuan
Han-Chin Shing
Mitch Strong
Chaitanya P. Shivade
132
0
0
08 Oct 2025
LASER: An LLM-based ASR Scoring and Evaluation Rubric
Amruta Parulekar
Preethi Jyothi
112
1
0
08 Oct 2025
Mitigating Surgical Data Imbalance with Dual-Prediction Video Diffusion Model
Danush Kumar Venkatesh
Adam Schmidt
Muhammad Abdullah Jamal
Omid Mohareri
VGen
MedIm
144
0
0
07 Oct 2025
Teamwork: Collaborative Diffusion with Low-rank Coordination and Adaptation
Sam Sartor
Pieter Peers
DiffM
160
1
0
07 Oct 2025
Uncertainty in Machine Learning
Hans Weytjens
Wouter Verbeke
UD
259
0
0
07 Oct 2025
Riddled basin geometry sets fundamental limits to predictability and reproducibility in deep learning
Andrew Ly
Pulin Gong
AI4CE
187
0
0
07 Oct 2025
Unsupervised Active Learning via Natural Feature Progressive Framework
Yuxi Liu
Catherine Lalman
Yimin Yang
150
0
0
06 Oct 2025
TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling
Hyunmin Cho
Donghoon Ahn
S. Hong
J. Kim
Seungryong Kim
Kyong Hwan Jin
DiffM
146
0
0
06 Oct 2025
Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI
Kun Xiang
Terry Jingchen Zhang
Yinya Huang
Jixi He
Zirong Liu
...
J. N. Han
Hang Xu
Han Li
Bin Dong
Xiaodan Liang
PINN
AI4CE
378
1
0
06 Oct 2025
Beyond the Seen: Bounded Distribution Estimation for Open-Vocabulary Learning
Xiaomeng Fan
Yuchuan Mao
Zhi Gao
Yuwei Wu
Jin Chen
Yunde Jia
164
1
0
06 Oct 2025
ObCLIP: Oblivious CLoud-Device Hybrid Image Generation with Privacy Preservation
Haoqi Wu
Wei Dai
Ming Xu
Li Wang
Qiang Yan
DiffM
169
0
0
05 Oct 2025
Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers
Shikang Zheng
Guantao Chen
Qinming Zhou
Yuqi Lin
Lixuan He
Chang Zou
Peiliang Cai
Jiacheng Liu
Linfeng Zhang
153
2
0
05 Oct 2025
Activation Steering with a Feedback Controller
Dung V. Nguyen
Hieu M. Vu
Nhi Y. Pham
Lei Zhang
T. Nguyen
LLMSV
192
0
0
05 Oct 2025
WebRenderBench: Enhancing Web Interface Generation through Layout-Style Consistency and Reinforcement Learning
Peichao Lai
Jinhui Zhuang
Kexuan Zhang
Ningchang Xiong
Shengjie Wang
Yanwei Xu
Chong Chen
Yilei Wang
Bin Cui
182
0
0
05 Oct 2025
Diverse Text-to-Image Generation via Contrastive Noise Optimization
Byungjun Kim
Soobin Um
Jong Chul Ye
164
0
0
04 Oct 2025
OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows
John Nguyen
Marton Havasi
Tariq Berrada
Luke Zettlemoyer
Ricky T. Q. Chen
208
4
0
03 Oct 2025
Smart-GRPO: Smartly Sampling Noise for Efficient RL of Flow-Matching Models
Benjamin Yu
Jackie Liu
Justin Cui
133
1
0
03 Oct 2025
Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models
Tianren Ma
Mu Zhang
Yibing Wang
Qixiang Ye
90
1
0
03 Oct 2025
One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework
Lorenzo Bianchi
Giacomo Pacini
F. Carrara
Nicola Messina
Giuseppe Amato
Fabrizio Falchi
VLM
179
0
0
03 Oct 2025
TIT-Score: Evaluating Long-Prompt Based Text-to-Image Alignment via Text-to-Image-to-Text Consistency
Juntong Wang
Huiyu Duan
Jiarui Wang
Ziheng Jia
Guangtao Zhai
Xiongkuo Min
EGVM
ALM
LM&MA
VLM
152
2
0
03 Oct 2025
PEO: Training-Free Aesthetic Quality Enhancement in Pre-Trained Text-to-Image Diffusion Models with Prompt Embedding Optimization
Hovhannes Margaryan
Bo Wan
Tinne Tuytelaars
282
0
0
02 Oct 2025
Previous
1
2
3
4
5
6
...
28
29
30
Next
Page 3 of 30
Page
of 30
Go