v1v2v3 (latest)

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021

18 April 2021

Yejin Choi

Papers citing "CLIPScore: A Reference-free Evaluation Metric for Image Captioning"

50 / 1,488 papers shown

PEO: Training-Free Aesthetic Quality Enhancement in Pre-Trained Text-to-Image Diffusion Models with Prompt Embedding Optimization

Hovhannes Margaryan

Bo Wan

Tinne Tuytelaars

280

02 Oct 2025

Learn to Guide Your Diffusion Model

437

01 Oct 2025

Multi-Objective Task-Aware Predictor for Image-Text Alignment

133

01 Oct 2025

Data Selection for Fine-tuning Vision Language Models via Cross Modal Alignment Trajectories

Baharan Mirzasoleiman

100

01 Oct 2025

ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning

134

01 Oct 2025

VIRTUE: Visual-Interactive Text-Image Universal Embedder

143

01 Oct 2025

FinCap: Topic-Aligned Captions for Short-Form Financial YouTube Videos

30 Sep 2025

EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing

116

30 Sep 2025

PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models

J. Lee

Jong Chul Ye

104

30 Sep 2025

VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions

152

30 Sep 2025

Post-Training Quantization via Residual Truncation and Zero Suppression for Diffusion Models

144

30 Sep 2025

Fidelity-Aware Data Composition for Robust Robot Generalization

136

29 Sep 2025

TraitSpaces: Towards Interpretable Visual Creativity for Human-AI Co-Creation

Prerna Luthra

29 Sep 2025

GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models

172

29 Sep 2025

Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models

210

29 Sep 2025

When Scores Learn Geometry: Rate Separations under the Manifold Hypothesis

1.4K

29 Sep 2025

M3DLayout: A Multi-Source Dataset of 3D Indoor Layouts and Structured Descriptions for 3D Generation

164

28 Sep 2025

Diff-3DCap: Shape Captioning with Diffusion ModelsIEEE Transactions on Visualization and Computer Graphics (TVCG), 2025

123

28 Sep 2025

RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks

Amit Agarwal

Hitesh Laxmichand Patel

130

28 Sep 2025

Towards Fine-Grained Text-to-3D Quality Assessment: A Benchmark and A Two-Stage Rank-Learning Metric

229

28 Sep 2025

Enhancing Blind Face Restoration through Online Reinforcement Learning

424

27 Sep 2025

No Concept Left Behind: Test-Time Optimization for Compositional Text-to-Image Generation

Mohammad Hossein Sameti

Amir M. Mansourian

Arash Marioriyad

Soheil Fadaee Oshyani

M. Rohban

M. Baghshah

27 Sep 2025

Follow-Your-Preference: Towards Preference-Aligned Image Inpainting

180

27 Sep 2025

CREPE: Controlling Diffusion with Replica Exchange

José Miguel Hernández-Lobato

27 Sep 2025

MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models

Jonas Belouadi

T. Boubekeur

Adrien Kaiser

102

26 Sep 2025

Guidance Watermarking for Diffusion Models

224

26 Sep 2025

Memory Self-Regeneration: Uncovering Hidden Knowledge in Unlearned Models

160

26 Sep 2025

HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models

160

26 Sep 2025

Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks

312

26 Sep 2025

Drag4D: Align Your Motion with Text-Driven 3D Scene Generation

117

26 Sep 2025

FailureAtlas:Mapping the Failure Landscape of T2I Models via Active Exploration

...

26 Sep 2025

LLMs Behind the Scenes: Enabling Narrative Scene Illustration

Melissa Roemmele

John Joon Young Chung

124

26 Sep 2025

Rethinking Inter-LoRA Orthogonality in Adapter Merging: Insights from Orthogonal Monte Carlo Dropout

181

26 Sep 2025

UniMIC: Token-Based Multimodal Interactive Coding for Human-AI Collaboration

152

26 Sep 2025

TDEdit: A Unified Diffusion Framework for Text-Drag Guided Image Manipulation

100

26 Sep 2025

Un-Doubling Diffusion: LLM-guided Disambiguation of Homonym Duplication

333

25 Sep 2025

Evaluating the Evaluators: Metrics for Compositional Text-to-Image Generation

Mahdieh Soleymani Baghshah

M. Rohban

EGVM

247

25 Sep 2025

VLCE: A Knowledge-Enhanced Framework for Image Description in Disaster Assessment

215

25 Sep 2025

MMPlanner: Zero-Shot Multimodal Procedural Planning with Chain-of-Thought Object State Reasoning

104

25 Sep 2025

Towards Multimodal Active Learning: Efficient Learning with Limited Paired Data

Jiancheng Zhang

Yinglun Zhu

180

25 Sep 2025

Seeing Through Words, Speaking Through Pixels: Deep Representational Alignment Between Vision and Language Models

124

25 Sep 2025

A Single Neuron Works: Precise Concept Erasure in Text-to-Image Diffusion Models

25 Sep 2025

A Unified Framework for Diffusion Model Unlearning with f-Divergence

226

25 Sep 2025

VC-Agent: An Interactive Agent for Customized Video Dataset Collection

172

25 Sep 2025

ConViS-Bench: Estimating Video Similarity Through Semantic Concepts

124

23 Sep 2025

CARINOX: Inference-time Scaling with Category-Aware Reward-based Initial Noise Optimization and Exploration

240

22 Sep 2025

Seg4Diff: Unveiling Open-Vocabulary Segmentation in Text-to-Image Diffusion Transformers

184

22 Sep 2025

VCE: Safe Autoregressive Image Generation via Visual Contrast Exploitation

170

21 Sep 2025

VidCLearn: A Continual Learning Approach for Text-to-Video Generation

120

21 Sep 2025

$$\mathtt{M^3VIR}$: A Large-Scale Multi-Modality Multi-View Synthesized Benchmark Dataset for Image Restoration and Content Creation$

\mathtt{M^3VIR}

: A Large-Scale Multi-Modality Multi-View Synthesized Benchmark Dataset for Image Restoration and Content Creation

165

21 Sep 2025