v1v2v3 (latest)

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021

18 April 2021

Yejin Choi

Papers citing "CLIPScore: A Reference-free Evaluation Metric for Image Captioning"

50 / 1,488 papers shown

Side Adapter Network for Open-Vocabulary Semantic SegmentationComputer Vision and Pattern Recognition (CVPR), 2023

311

362

23 Feb 2023

Aligning Text-to-Image Models using Human Feedback

Pieter Abbeel

338

383

23 Feb 2023

Test-Time Distribution Normalization for Contrastively Learned Vision-language ModelsNeural Information Processing Systems (NeurIPS), 2023

Ser-Nam Lim

240

22 Feb 2023

RePrompt: Automatic Prompt Editing to Refine AI-Generative Art Towards Precise ExpressionsInternational Conference on Human Factors in Computing Systems (CHI), 2023

Yunlong Wang

Shuyuan Shen

Brian Y. Lim

341

121

19 Feb 2023

Is This Loss Informative? Faster Text-to-Image Customization by Tracking Objective DynamicsNeural Information Processing Systems (NeurIPS), 2023

251

09 Feb 2023

Q-Diffusion: Quantizing Diffusion ModelsIEEE International Conference on Computer Vision (ICCV), 2023

Zhen Dong

Shanghang Zhang

374

236

08 Feb 2023

Auditing Gender Presentation Differences in Text-to-Image ModelsConference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO), 2023

Diyi Yang

337

07 Feb 2023

Zero-shot Image-to-Image TranslationInternational Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), 2023

Jun-Yan Zhu

306

559

06 Feb 2023

Dreamix: Video Diffusion Models are General Video Editors

Yossi Matias

Yael Pritch

Yaniv Leviathan

Yedid Hoshen

DiffM VGen

304

216

02 Feb 2023

IC3: Image Captioning by Committee ConsensusConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

David M. Chan

Austin Myers

Sudheendra Vijayanarasimhan

David A. Ross

John F. Canny

296

02 Feb 2023

STAIR: Learning Sparse Text and Image Representation in Grounded TokensConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Chen Chen

Albin Madappally Jose

234

30 Jan 2023

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion ModelsInternational Conference on Machine Learning (ICML), 2023

Rongjie Huang

Dongchao Yang

Zhou Zhao

392

427

30 Jan 2023

StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image SynthesisInternational Conference on Machine Learning (ICML), 2023

324

262

23 Jan 2023

Embodied Agents for Efficient Exploration and Smart Scene DescriptionIEEE International Conference on Robotics and Automation (ICRA), 2023

Lorenzo Baraldi

165

17 Jan 2023

ANNA: Abstractive Text-to-Image Synthesis with Filtered News Captions

Aashish Anantha Ramakrishnan

Sharon X. Huang

Dongwon Lee

262

05 Jan 2023

Noise-aware Learning from Web-crawled Image-Text Data for Image CaptioningIEEE International Conference on Computer Vision (ICCV), 2022

245

27 Dec 2022

When are Lemons Purple? The Concept Association Bias of Vision-Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

293

22 Dec 2022

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video GenerationIEEE International Conference on Computer Vision (ICCV), 2022

Ying Shan

351

1,000

22 Dec 2022

Character-Aware Models Improve Visual Text RenderingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Sharan Narang

244

20 Dec 2022

Trustworthy Social Bias MeasurementAAAI/ACM Conference on AI, Ethics, and Society (AIES), 2022

Rishi Bommasani

Abigail Z. Jacobs

243

20 Dec 2022

On the Blind Spots of Model-Based Evaluation Metrics for Text GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Tianxing He

Jingyu Zhang

Tianle Wang

383

20 Dec 2022

Benchmarking Spatial Relationships in Text-to-Image Generation

Yezhou Yang

361

20 Dec 2022

One Embedder, Any Task: Instruction-Finetuned Text EmbeddingsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Weijia Shi

Mari Ostendorf

Luke Zettlemoyer

278

395

19 Dec 2022

Cross-Modal Similarity-Based Curriculum Learning for Image CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

146

14 Dec 2022

Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image InpaintingComputer Vision and Pattern Recognition (CVPR), 2022

...

223

250

13 Dec 2022

CREPE: Can Vision-Language Foundation Models Reason Compositionally?Computer Vision and Pattern Recognition (CVPR), 2022

371

180

13 Dec 2022

Multi-Concept Customization of Text-to-Image DiffusionComputer Vision and Pattern Recognition (CVPR), 2022

Jun-Yan Zhu

689

1,162

08 Dec 2022

DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue DatasetNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

296

08 Dec 2022

Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement LearningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022

Ukyo Honda

Taro Watanabe

Yuji Matsumoto

215

06 Dec 2022

ObjectStitch: Generative Object Compositing

Zhifei Zhang

294

02 Dec 2022

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text PairsComputer Vision and Pattern Recognition (CVPR), 2022

371

127

01 Dec 2022

Exploring Discrete Diffusion Models for Image Captioning

Zicheng Liu

255

21 Nov 2022

Video Background Music Generation: Dataset, Method and EvaluationIEEE International Conference on Computer Vision (ICCV), 2022

251

21 Nov 2022

How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation

Jie Ruan

Yue Wu

Xiaojun Wan

Yuesheng Zhu

139

20 Nov 2022

CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained KnowledgeThe Web Conference (WWW), 2022

Linli Yao

Wei Chen

Qin Jin

VLM

314

17 Nov 2022

Large-Scale Bidirectional Training for Zero-Shot Image Captioning

210

13 Nov 2022

I Hear Your True Colors: Image Guided Audio GenerationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Roy Sheffer

Yossi Adi

VLM

235

104

06 Nov 2022

Evaluating and Improving Factuality in Multimodal Abstractive SummarizationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

David Wan

Joey Tianyi Zhou

172

04 Nov 2022

UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal Guidance

...

493

28 Oct 2022

SSD: Towards Better Text-Image Consistency Metric in Text-to-Image GenerationSocial Science Research Network (SSRN), 2022

Anh Nguyen

183

27 Oct 2022

Open-vocabulary Semantic Segmentation with Frozen Vision-Language ModelsBritish Machine Vision Conference (BMVC), 2022

156

27 Oct 2022

On the Limitations of Reference-Free Evaluations of Generated TextConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Daniel Deutsch

Rotem Dror

Dan Roth

278

22 Oct 2022

Instance-Aware Image Completion

169

22 Oct 2022

DiffEdit: Diffusion-based semantic image editing with mask guidanceInternational Conference on Learning Representations (ICLR), 2022

385

653

20 Oct 2022

Probing Cross-modal Semantics Alignment Capability from the Textual PerspectiveConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

166

18 Oct 2022

Imagic: Text-Based Real Image Editing with Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2022

543

1,329

17 Oct 2022

Imagen Video: High Definition Video Generation with Diffusion Models

Ruiqi Gao

...

David J. Fleet

420

1,850

05 Oct 2022

Vision+X: A Survey on Multimodal Learning in the Light of DataIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

Ye Zhu

Yuehua Wu

Andrii Zadaianchuk

Yan Yan

354

05 Oct 2022

Affection: Learning Affective Explanations for Real-World Visual DataComputer Vision and Pattern Recognition (CVPR), 2022

173

04 Oct 2022

Linearly Mapping from Image to Text SpaceInternational Conference on Learning Representations (ICLR), 2022

1.2K

145

30 Sep 2022

All Papers

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

Papers citing "CLIPScore: A Reference-free Evaluation Metric for Image Captioning"