v1v2v3 (latest)

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021

18 April 2021

Yejin Choi

Papers citing "CLIPScore: A Reference-free Evaluation Metric for Image Captioning"

50 / 1,488 papers shown

X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models

Yixiong Chen

Li Liu

C. Ding

174

18 May 2023

InfoMetIC: An Informative Metric for Reference-free Image Caption EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Anwen Hu

Shizhe Chen

Liang Zhang

Qin Jin

227

10 May 2023

iEdit: Localised Text-guided Image Editing with Weak Supervision

196

10 May 2023

ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation

139

08 May 2023

Locally Attentional SDF Diffusion for Controllable 3D Shape GenerationACM Transactions on Graphics (TOG), 2023

Xin-Yang Zheng

Hao Pan

Peng-Shuai Wang

Xin Tong

Yang Liu

H. Shum

302

164

08 May 2023

Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data PoisoningACM Multimedia (ACM MM), 2023

Yinpeng Dong

Hang Su

229

07 May 2023

A Suite of Generative Tasks for Multi-Level Multimodal Webpage UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Joshua Ainslie

218

05 May 2023

The Role of Data Curation in Image CaptioningConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023

258

05 May 2023

Multimodal Data Augmentation for Image Captioning using Diffusion Models

207

03 May 2023

Multimodal Procedural Planning via Dual Text-Image PromptingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

234

02 May 2023

SceneGenie: Scene Graph Guided Diffusion Models for Image Synthesis

Nassir Navab

226

28 Apr 2023

Rethinking Benchmarks for Cross-modal Image-text RetrievalAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023

Wei Chen

Linli Yao

Qin Jin

VLM

288

21 Apr 2023

Soundini: Sound-Guided Diffusion for Natural Video Editing

Feng Yang

188

13 Apr 2023

A-CAP: Anticipation Captioning with Commonsense KnowledgeComputer Vision and Pattern Recognition (CVPR), 2023

149

13 Apr 2023

Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA

447

143

12 Apr 2023

HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image ModelsIEEE International Conference on Computer Vision (ICCV), 2023

307

104

11 Apr 2023

OpenAGI: When LLM Meets Domain ExpertsNeural Information Processing Systems (NeurIPS), 2023

Juntao Tan

317

308

10 Apr 2023

Model-Agnostic Gender Debiased Image CaptioningComputer Vision and Pattern Recognition (CVPR), 2023

339

07 Apr 2023

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image EditingIEEE International Conference on Computer Vision (ICCV), 2023

212

04 Apr 2023

Toward Verifiable and Reproducible Human Evaluation for Text-to-Image GenerationComputer Vision and Pattern Recognition (CVPR), 2023

Esa Rahtu

Shiníchi Satoh

224

04 Apr 2023

Cross-Domain Image Captioning with Discriminative FinetuningComputer Vision and Pattern Recognition (CVPR), 2023

Roberto Dessì

Michele Bevilacqua

Eleonora Gualdoni

Nathanaël Carraz Rakotonirina

Francesca Franzon

Marco Baroni

CLIP

248

04 Apr 2023

Text-Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative ModelsIEEE International Conference on Computer Vision (ICCV), 2023

239

04 Apr 2023

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free VideosAAAI Conference on Artificial Intelligence (AAAI), 2023

Yue Ma

Xiaodong Cun

Ying Shan

275

274

03 Apr 2023

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

Hao Chen

Chunhua Shen

268

135

30 Mar 2023

MDP: A Generalized Framework for Text-Guided Image Editing by Manipulating the Diffusion Path

251

29 Mar 2023

Hierarchical Video-Moment Retrieval and Step-CaptioningComputer Vision and Pattern Recognition (CVPR), 2023

276

29 Mar 2023

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

248

28 Mar 2023

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

Jian Yang

398

28 Mar 2023

Fine-grained Audible Video DescriptionComputer Vision and Pattern Recognition (CVPR), 2023

Zhen Qin

...

Yuchao Dai

Lingpeng Kong

Meng Wang

Yu Qiao

Yiran Zhong

VGen

175

27 Mar 2023

Ablating Concepts in Text-to-Image Diffusion ModelsIEEE International Conference on Computer Vision (ICCV), 2023

Jun-Yan Zhu

482

283

23 Mar 2023

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video GeneratorsIEEE International Conference on Computer Vision (ICCV), 2023

308

733

23 Mar 2023

Zero-guidance Segmentation Using Zero Segment LabelsIEEE International Conference on Computer Vision (ICCV), 2023

Pitchaporn Rewatbowornwong

Nattanat Chatthee

Ekapol Chuangsuwanich

Supasorn Suwajanakorn

VLM

173

23 Mar 2023

Pix2Video: Video Editing using Image DiffusionIEEE International Conference on Computer Vision (ICCV), 2023

Duygu Ceylan

C. Huang

Niloy J. Mitra

DiffM VGen

412

339

22 Mar 2023

Positive-Augmented Contrastive Learning for Image and Video Captioning EvaluationComputer Vision and Pattern Recognition (CVPR), 2023

Lorenzo Baraldi

335

21 Mar 2023

VideoXum: Cross-modal Visual and Textural Summarization of VideosIEEE transactions on multimedia (IEEE TMM), 2023

381

21 Mar 2023

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question AnsweringIEEE International Conference on Computer Vision (ICCV), 2023

Mari Ostendorf

337

344

21 Mar 2023

VEIL: Vetting Extracted Image Labels from In-the-Wild Captions for Weakly-Supervised Object DetectionConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023

Arushi Rai

Adriana Kovashka

290

16 Mar 2023

FateZero: Fusing Attentions for Zero-shot Text-based Video EditingIEEE International Conference on Computer Vision (ICCV), 2023

Zhiheng Liu

Xiaodong Cun

Yong Zhang

Ying Shan

413

466

16 Mar 2023

PR-MCS: Perturbation Robust Metric for MultiLingual Image CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

270

15 Mar 2023

Editing Implicit Assumptions in Text-to-Image Diffusion ModelsIEEE International Conference on Computer Vision (ICCV), 2023

368

116

14 Mar 2023

Text-to-image Diffusion Models in Generative AI: A Survey

Chenshuang Zhang

Chaoning Zhang

Mengchun Zhang

In So Kweon

VLM

315

380

14 Mar 2023

Scaling up GANs for Text-to-Image SynthesisComputer Vision and Pattern Recognition (CVPR), 2023

Jun-Yan Zhu

328

601

09 Mar 2023

CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive LearningIEEE International Conference on Computer Vision (ICCV), 2023

373

06 Mar 2023

DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only TrainingInternational Conference on Learning Representations (ICLR), 2023

Yi Yang

229

119

06 Mar 2023

Models See Hallucinations: Evaluating the Factuality in Video CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Hui Liu

Xiaojun Wan

HILM

183

06 Mar 2023

ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based PolishingComputer Vision and Pattern Recognition (CVPR), 2023

Dongsheng Wang

225

04 Mar 2023

X&Fuse: Fusing Visual Information in Text-to-Image Generation

02 Mar 2023

Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot LearningInternational Conference on Learning Representations (ICLR), 2023

195

28 Feb 2023

Directed Diffusion: Direct Control of Object Placement through Attention GuidanceAAAI Conference on Artificial Intelligence (AAAI), 2023

363

25 Feb 2023

Learning Visual Representations via Language-Guided SamplingComputer Vision and Pattern Recognition (CVPR), 2023

398

23 Feb 2023