Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Home
Papers
2104.08718
Cited By
v1
v2
v3 (latest)
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
18 April 2021
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CLIPScore: A Reference-free Evaluation Metric for Image Captioning"
50 / 1,488 papers shown
Side Adapter Network for Open-Vocabulary Semantic Segmentation
Computer Vision and Pattern Recognition (CVPR), 2023
Mengde Xu
Zheng Zhang
Fangyun Wei
Han Hu
Xiang Bai
VLM
311
362
0
23 Feb 2023
Aligning Text-to-Image Models using Human Feedback
Kimin Lee
Hao Liu
Moonkyung Ryu
Olivia Watkins
Yuqing Du
Craig Boutilier
Pieter Abbeel
Mohammad Ghavamzadeh
S. Gu
EGVM
338
383
0
23 Feb 2023
Test-Time Distribution Normalization for Contrastively Learned Vision-language Models
Neural Information Processing Systems (NeurIPS), 2023
Yi Zhou
Juntao Ren
Fengyu Li
Ramin Zabih
Ser-Nam Lim
VLM
240
21
0
22 Feb 2023
RePrompt: Automatic Prompt Editing to Refine AI-Generative Art Towards Precise Expressions
International Conference on Human Factors in Computing Systems (CHI), 2023
Yunlong Wang
Shuyuan Shen
Brian Y. Lim
341
121
0
19 Feb 2023
Is This Loss Informative? Faster Text-to-Image Customization by Tracking Objective Dynamics
Neural Information Processing Systems (NeurIPS), 2023
Anton Voronov
Mikhail Khoroshikh
Artem Babenko
Max Ryabinin
251
8
0
09 Feb 2023
Q-Diffusion: Quantizing Diffusion Models
IEEE International Conference on Computer Vision (ICCV), 2023
Xiuyu Li
Yijia Liu
Long Lian
Hua Yang
Zhen Dong
Daniel Kang
Shanghang Zhang
Kurt Keutzer
DiffM
MQ
374
236
0
08 Feb 2023
Auditing Gender Presentation Differences in Text-to-Image Models
Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO), 2023
Yanzhe Zhang
Lu Jiang
Greg Turk
Diyi Yang
EGVM
337
27
0
07 Feb 2023
Zero-shot Image-to-Image Translation
International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), 2023
Gaurav Parmar
Krishna Kumar Singh
Richard Y. Zhang
Yijun Li
Jingwan Lu
Jun-Yan Zhu
DiffM
306
559
0
06 Feb 2023
Dreamix: Video Diffusion Models are General Video Editors
Eyal Molad
Eliahu Horwitz
Dani Valevski
Alex Rav-Acha
Yossi Matias
Yael Pritch
Yaniv Leviathan
Yedid Hoshen
DiffM
VGen
304
216
0
02 Feb 2023
IC3: Image Captioning by Committee Consensus
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
296
23
0
02 Feb 2023
STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Chen Chen
Bowen Zhang
Liangliang Cao
Jiguang Shen
Tom Gunter
Albin Madappally Jose
Alexander Toshev
Jonathon Shlens
Ruoming Pang
Yinfei Yang
VLM
3DV
234
25
0
30 Jan 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
International Conference on Machine Learning (ICML), 2023
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
392
427
0
30 Jan 2023
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
International Conference on Machine Learning (ICML), 2023
Axel Sauer
Tero Karras
S. Laine
Andreas Geiger
Timo Aila
324
262
0
23 Jan 2023
Embodied Agents for Efficient Exploration and Smart Scene Description
IEEE International Conference on Robotics and Automation (ICRA), 2023
Roberto Bigazzi
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
LM&Ro
165
9
0
17 Jan 2023
ANNA: Abstractive Text-to-Image Synthesis with Filtered News Captions
Aashish Anantha Ramakrishnan
Sharon X. Huang
Dongwon Lee
262
6
0
05 Jan 2023
Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
IEEE International Conference on Computer Vision (ICCV), 2022
Woohyun Kang
Jonghwan Mun
Sungjun Lee
Byungseok Roh
VLM
245
27
0
27 Dec 2022
When are Lemons Purple? The Concept Association Bias of Vision-Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yutaro Yamada
Yingtian Tang
Yoyo Zhang
Ilker Yildirim
CoGe
293
22
0
22 Dec 2022
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
IEEE International Conference on Computer Vision (ICCV), 2022
Jay Zhangjie Wu
Yixiao Ge
Xintao Wang
Weixian Lei
Yuchao Gu
Yufei Shi
Wynne Hsu
Ying Shan
Xiaohu Qie
Mike Zheng Shou
VGen
351
1,000
0
22 Dec 2022
Character-Aware Models Improve Visual Text Rendering
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Rosanne Liu
Daniel H Garrette
Chitwan Saharia
William Chan
Adam Roberts
Sharan Narang
Irina Blok
R. Mical
Mohammad Norouzi
Noah Constant
VLM
244
84
0
20 Dec 2022
Trustworthy Social Bias Measurement
AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2022
Rishi Bommasani
Abigail Z. Jacobs
243
13
0
20 Dec 2022
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Tianxing He
Jingyu Zhang
Tianle Wang
Sachin Kumar
Dong Wang
James R. Glass
Yulia Tsvetkov
383
59
0
20 Dec 2022
Benchmarking Spatial Relationships in Text-to-Image Generation
Tejas Gokhale
Hamid Palangi
Besmira Nushi
Vibhav Vineet
Eric Horvitz
Ece Kamar
Chitta Baral
Yezhou Yang
EGVM
361
86
0
20 Dec 2022
One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Hongjin Su
Weijia Shi
Jungo Kasai
Yizhong Wang
Yushi Hu
Mari Ostendorf
Anuj Kumar
Noah A. Smith
Luke Zettlemoyer
Tao Yu
278
395
0
19 Dec 2022
Cross-Modal Similarity-Based Curriculum Learning for Image Captioning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Hongkuan Zhang
Saku Sugawara
Akiko Aizawa
Lei Zhou
Ryohei Sasano
Koichi Takeda
VLM
146
5
0
14 Dec 2022
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting
Computer Vision and Pattern Recognition (CVPR), 2022
Su Wang
Chitwan Saharia
Ceslee Montgomery
Jordi Pont-Tuset
Shai Noy
...
Radu Soricut
Jason Baldridge
Mohammad Norouzi
Peter Anderson
William Chan
223
250
0
13 Dec 2022
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Computer Vision and Pattern Recognition (CVPR), 2022
Zixian Ma
Jerry Hong
Mustafa Omer Gul
Mona Gandhi
Irena Gao
Ranjay Krishna
CoGe
371
180
0
13 Dec 2022
Multi-Concept Customization of Text-to-Image Diffusion
Computer Vision and Pattern Recognition (CVPR), 2022
Nupur Kumari
Bin Zhang
Richard Y. Zhang
Eli Shechtman
Jun-Yan Zhu
689
1,162
0
08 Dec 2022
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Young-Jun Lee
ByungSoo Ko
Han-Gyu Kim
Jonghwan Hyeon
Ho-Jin Choi
296
12
0
08 Dec 2022
Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Ukyo Honda
Taro Watanabe
Yuji Matsumoto
215
10
0
06 Dec 2022
ObjectStitch: Generative Object Compositing
Yi-Zhe Song
Zhifei Zhang
Zhe Lin
Scott D. Cohen
Brian L. Price
Jianming Zhang
Seunggeun Kim
Daniel G. Aliaga
DiffM
294
40
0
02 Dec 2022
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs
Computer Vision and Pattern Recognition (CVPR), 2022
Junbum Cha
Jonghwan Mun
Byungseok Roh
VLM
371
127
0
01 Dec 2022
Exploring Discrete Diffusion Models for Image Captioning
Zixin Zhu
Yixuan Wei
Jianfeng Wang
Zhe Gan
Zheng Zhang
Le Wang
G. Hua
Lijuan Wang
Zicheng Liu
Han Hu
DiffM
VLM
255
31
0
21 Nov 2022
Video Background Music Generation: Dataset, Method and Evaluation
IEEE International Conference on Computer Vision (ICCV), 2022
Le Zhuo
Zhaokai Wang
Baisen Wang
Yue Liao
Chenxi Bao
Stanley Peng
Miao Lu
Xiaobo Li
Fei Fang
Si Liu
VGen
251
46
0
21 Nov 2022
How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation
Jie Ruan
Yue Wu
Xiaojun Wan
Yuesheng Zhu
139
1
0
20 Nov 2022
CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge
The Web Conference (WWW), 2022
Linli Yao
Wei Chen
Qin Jin
VLM
314
11
0
17 Nov 2022
Large-Scale Bidirectional Training for Zero-Shot Image Captioning
Taehoon Kim
Mark A Marsden
Pyunghwan Ahn
Sangyun Kim
Sihaeng Lee
Alessandra Sala
S. Kim
VLM
210
5
0
13 Nov 2022
I Hear Your True Colors: Image Guided Audio Generation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Roy Sheffer
Yossi Adi
VLM
235
104
0
06 Nov 2022
Evaluating and Improving Factuality in Multimodal Abstractive Summarization
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
David Wan
Joey Tianyi Zhou
172
12
0
04 Nov 2022
UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal Guidance
Wei Li
Xue Xu
Xinyan Xiao
Jiacheng Liu
Hu Yang
...
Zhanpeng Wang
Zhifan Feng
Qiaoqiao She
Yajuan Lyu
Hua Wu
493
31
0
28 Oct 2022
SSD: Towards Better Text-Image Consistency Metric in Text-to-Image Generation
Social Science Research Network (SSRN), 2022
Zhaorui Tan
Xi Yang
Zihan Ye
Qiufeng Wang
Yuyao Yan
Anh Nguyen
Kaizhu Huang
EGVM
183
3
0
27 Oct 2022
Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models
British Machine Vision Conference (BMVC), 2022
Chaofan Ma
Yu-Hao Yang
Yanfeng Wang
Ya Zhang
Weidi Xie
VLM
156
55
0
27 Oct 2022
On the Limitations of Reference-Free Evaluations of Generated Text
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Daniel Deutsch
Rotem Dror
Dan Roth
278
53
0
22 Oct 2022
Instance-Aware Image Completion
Ji-Ho Cho
Minguk Kang
Vibhav Vineet
Jaesik Park
ISeg
VLM
169
2
0
22 Oct 2022
DiffEdit: Diffusion-based semantic image editing with mask guidance
International Conference on Learning Representations (ICLR), 2022
Guillaume Couairon
Jakob Verbeek
Holger Schwenk
Matthieu Cord
DiffM
385
653
0
20 Oct 2022
Probing Cross-modal Semantics Alignment Capability from the Textual Perspective
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zheng Ma
Shi Zong
Mianzhi Pan
Jianbing Zhang
Shujian Huang
Xinyu Dai
Jiajun Chen
166
5
0
18 Oct 2022
Imagic: Text-Based Real Image Editing with Diffusion Models
Computer Vision and Pattern Recognition (CVPR), 2022
Bahjat Kawar
Shiran Zada
Oran Lang
Omer Tov
Hui-Tang Chang
Tali Dekel
Inbar Mosseri
Michal Irani
543
1,329
0
17 Oct 2022
Imagen Video: High Definition Video Generation with Diffusion Models
Jonathan Ho
William Chan
Chitwan Saharia
Jay Whang
Ruiqi Gao
...
Diederik P. Kingma
Ben Poole
Mohammad Norouzi
David J. Fleet
Tim Salimans
VGen
420
1,850
0
05 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Ye Zhu
Yuehua Wu
Andrii Zadaianchuk
Yan Yan
354
38
0
05 Oct 2022
Affection: Learning Affective Explanations for Real-World Visual Data
Computer Vision and Pattern Recognition (CVPR), 2022
Panos Achlioptas
M. Ovsjanikov
Leonidas Guibas
Sergey Tulyakov
173
24
0
04 Oct 2022
Linearly Mapping from Image to Text Space
International Conference on Learning Representations (ICLR), 2022
Jack Merullo
Louis Castricato
Carsten Eickhoff
Ellie Pavlick
VLM
1.2K
145
0
30 Sep 2022
Previous
1
2
3
...
28
29
30
Next