Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2104.08718
Cited By
v1
v2
v3 (latest)
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
18 April 2021
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CLIPScore: A Reference-free Evaluation Metric for Image Captioning"
39 / 1,489 papers shown
Linearly Mapping from Image to Text Space
International Conference on Learning Representations (ICLR), 2022
Jack Merullo
Louis Castricato
Carsten Eickhoff
Ellie Pavlick
VLM
1.2K
145
0
30 Sep 2022
GAMA: Generative Adversarial Multi-Object Scene Attacks
Neural Information Processing Systems (NeurIPS), 2022
Abhishek Aich
Calvin-Khang Ta
Akash Gupta
Chengyu Song
S. Krishnamurthy
M. Salman Asif
Amit K. Roy-Chowdhury
AAML
307
24
0
20 Sep 2022
Learning Distinct and Representative Styles for Image Captioning
Neural Information Processing Systems (NeurIPS), 2022
Qi Chen
Chaorui Deng
Qi Wu
VLM
187
27
0
17 Sep 2022
Distribution Aware Metrics for Conditional Natural Language Generation
International Conference on Language Resources and Evaluation (LREC), 2022
David M. Chan
Yiming Ni
David A. Ross
Sudheendra Vijayanarasimhan
Austin Myers
John F. Canny
363
4
0
15 Sep 2022
Every picture tells a story: Image-grounded controllable stylistic story generation
Holy Lovenia
Bryan Wilie
Romain Barraud
Samuel Cahyawijaya
Willy Chung
Pascale Fung
218
8
0
04 Sep 2022
Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
AAAI Conference on Artificial Intelligence (AAAI), 2022
Wanshu Fan
Yen-Chun Chen
Dongdong Chen
Yu Cheng
Lu Yuan
Yu-Chiang Frank Wang
DiffM
252
114
0
29 Aug 2022
Deepfake: Definitions, Performance Metrics and Standards, Datasets and Benchmarks, and a Meta-Review
Enes ALTUNCU
V. N. Franqueira
Shujun Li
344
26
0
21 Aug 2022
ARMANI: Part-level Garment-Text Alignment for Unified Cross-Modal Fashion Design
ACM Multimedia (ACM MM), 2022
Xujie Zhang
Yuyang Sha
Michael C. Kampffmeyer
Zhenyu Xie
Zequn Jie
Chengwen Huang
Jianqing Peng
Xiaodan Liang
184
26
0
11 Aug 2022
A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch
European Conference on Computer Vision (ECCV), 2022
Patsorn Sangkloy
Wittawat Jitkrittum
Diyi Yang
James Hays
3DV
190
42
0
05 Aug 2022
Exploring CLIP for Assessing the Look and Feel of Images
AAAI Conference on Artificial Intelligence (AAAI), 2022
Jianyi Wang
Kelvin C. K. Chan
Chen Change Loy
VLM
430
977
0
25 Jul 2022
Zero-Shot Video Captioning with Evolving Pseudo-Tokens
Yoad Tewel
Yoav Shalev
Roy Nadler
Idan Schwartz
Lior Wolf
233
33
0
22 Jul 2022
GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features
European Conference on Computer Vision (ECCV), 2022
Van-Quang Nguyen
Masanori Suganuma
Takayuki Okatani
ViT
218
148
0
20 Jul 2022
Are metrics measuring what they should? An evaluation of image captioning task metrics
Signal processing. Image communication (SPIC), 2022
Othón González-Chávez
Guillermo Ruiz
Daniela Moctezuma
Tania A. Ramirez-delreal
226
10
0
04 Jul 2022
An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics
ACM Computing Surveys (ACM CSUR), 2022
Huan Yee Koh
Jiaxin Ju
Ming Liu
Shirui Pan
265
158
0
03 Jul 2022
Personalized Showcases: Generating Multi-Modal Explanations for Recommendations
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2022
An Yan
Zhankui He
Jiacheng Li
Tianyang Zhang
Julian McAuley
265
54
0
30 Jun 2022
Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation
International Conference on Learning Representations (ICLR), 2022
Ye Zhu
Yuehua Wu
Kyle Olszewski
Jian Ren
Sergey Tulyakov
Yan Yan
DiffM
383
57
0
15 Jun 2022
Fine-grained Image Captioning with CLIP Reward
Jaemin Cho
Seunghyun Yoon
Ajinkya Kale
Franck Dernoncourt
Trung Bui
Joey Tianyi Zhou
CLIP
391
98
0
26 May 2022
Mutual Information Divergence: A Unified Metric for Multimodal Generative Models
Neural Information Processing Systems (NeurIPS), 2022
Jin-Hwa Kim
Yunji Kim
Jiyoung Lee
Kang Min Yoo
Sang-Woo Lee
EGVM
348
41
0
25 May 2022
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Neural Information Processing Systems (NeurIPS), 2022
Chitwan Saharia
William Chan
Saurabh Saxena
Lala Li
Jay Whang
...
Raphael Gontijo-Lopes
Tim Salimans
Jonathan Ho
David J Fleet
Mohammad Norouzi
VLM
1.2K
7,527
0
23 May 2022
Context Matters for Image Descriptions for Accessibility: Challenges for Referenceless Evaluation Metrics
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Elisa Kreiss
Cynthia L. Bennett
Shayan Hooshmand
E. Zelikman
Meredith Ringel Morris
Christopher Potts
246
35
0
21 May 2022
RankGen: Improving Text Generation with Large Ranking Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Kalpesh Krishna
Yapei Chang
John Wieting
Mohit Iyyer
AIMat
335
79
0
19 May 2022
Language Models Can See: Plugging Visual Controls in Text Generation
Yixuan Su
Tian Lan
Yahui Liu
Fangyu Liu
Dani Yogatama
Yan Wang
Lingpeng Kong
Nigel Collier
VLM
MLLM
274
111
0
05 May 2022
QRelScore: Better Evaluating Generated Questions with Deeper Understanding of Context-aware Relevance
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Xiaoqiang Wang
Bang Liu
Siliang Tang
Lingfei Wu
205
12
0
29 Apr 2022
Repro: An Open-Source Library for Improving the Reproducibility and Usability of Publicly Available Research Code
Daniel Deutsch
Dan Roth
AI4CE
245
2
0
29 Apr 2022
Video Captioning: a comparative review of where we are and which could be the route
Computer Vision and Image Understanding (CVIU), 2022
Daniela Moctezuma
Tania A. Ramirez-delreal
Guillermo Ruiz
Othón González-Chávez
218
14
0
12 Apr 2022
How does fake news use a thumbnail? CLIP-based Multimodal Detection on the Unrepresentative News Image
H. Choi
Yejun Yoon
Seunghyun Yoon
Kunwoo Park
157
9
0
12 Apr 2022
DT2I: Dense Text-to-Image Generation from Region Descriptions
International Conference on Artificial Neural Networks (ICANN), 2022
Stanislav Frolov
Prateek Bansal
Jörn Hees
Andreas Dengel
VLM
172
5
0
05 Apr 2022
Multi-Modal Knowledge Graph Construction and Application: A Survey
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2022
Xiangru Zhu
Zhixu Li
Xiaodan Wang
Xueyao Jiang
Yixiang Chen
Xuwu Wang
Yanghua Xiao
N. Yuan
211
237
0
11 Feb 2022
Injecting Semantic Concepts into End-to-End Image Captioning
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lin Liang
Zhe Gan
Lijuan Wang
Yezhou Yang
Zicheng Liu
ViT
VLM
243
122
0
09 Dec 2021
Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Lavinia Dunagan
Jacob Morrison
Alexander R. Fabbri
Yejin Choi
Noah A. Smith
239
45
0
08 Dec 2021
Extract Free Dense Labels from CLIP
Chong Zhou
Chen Change Loy
Bo Dai
VLM
CLIP
603
651
0
02 Dec 2021
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Computer Vision and Pattern Recognition (CVPR), 2021
Yoad Tewel
Yoav Shalev
Idan Schwartz
Lior Wolf
VLM
335
236
0
29 Nov 2021
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets
Marcella Cornia
Lorenzo Baraldi
G. Fiameni
Rita Cucchiara
321
14
0
24 Nov 2021
Transparent Human Evaluation for Image Captioning
Jungo Kasai
Keisuke Sakaguchi
Lavinia Dunagan
Jacob Morrison
Ronan Le Bras
Yejin Choi
Noah A. Smith
188
59
0
17 Nov 2021
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Yaya Shi
Xu Yang
Haiyang Xu
Chunfen Yuan
Bing Li
Weiming Hu
Zhengjun Zha
262
42
0
17 Nov 2021
Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Yupan Huang
Hongwei Xue
Bei Liu
Yutong Lu
218
64
0
19 Oct 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
437
348
0
14 Jul 2021
ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation
Findings (Findings), 2021
Wanrong Zhu
Xinze Wang
An Yan
Miguel P. Eckstein
Wenjie Wang
147
7
0
10 Jun 2021
Concadia: Towards Image-Based Text Generation with a Purpose
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Elisa Kreiss
Fei Fang
Noah D. Goodman
Christopher Potts
267
25
0
16 Apr 2021
Previous
1
2
3
...
28
29
30
Page 30 of 30
Page
of 30
Go