What You See is What You Read? Improving Text-Image Alignment Evaluation

What You See is What You Read? Improving Text-Image Alignment Evaluation

17 May 2023

Soravit Changpinyo

Jonathan Herzig

Papers citing "What You See is What You Read? Improving Text-Image Alignment Evaluation"

12 / 62 papers shown

Title
Revisiting the Role of Language Priors in Vision-Language Models Zhiqiu Lin Xinyue Chen Deepak Pathak Pengchuan Zhang Deva Ramanan VLM 10 7 0 02 Jun 2023
Transferring Visual Attributes from Natural Language to Verified Image Generation Rodrigo Valerio João Bordalo Michal Yarom Yonattan Bitton Idan Szpektor João Magalhães 13 5 0 24 May 2023
Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners Xuehai He Weixi Feng Tsu-jui Fu Varun Jampani Arjun Reddy Akula P. Narayana Sugato Basu William Yang Wang X. Wang DiffM 32 7 0 18 May 2023
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation Yuval Kirstain Adam Polyak Uriel Singer Shahbuland Matiana Joe Penna Omer Levy EGVM 152 345 0 02 May 2023
q2d: Turning Questions into Dialogs to Teach Models How to Search Yonatan Bitton Shlomi Cohen-Ganor Ido Hakimi Yoad Lewenberg Roee Aharoni Enav Weinreb 24 3 0 27 Apr 2023
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images Nitzan Bitton-Guetta Yonatan Bitton Jack Hessel Ludwig Schmidt Yuval Elovici Gabriel Stanovsky Roy Schwartz VLM 113 65 0 13 Mar 2023
Generated Faces in the Wild: Quantitative Comparison of Stable Diffusion, Midjourney and DALL-E 2 Ali Borji DiffM 56 117 0 02 Oct 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Junnan Li Dongxu Li Caiming Xiong S. Hoi MLLM BDL VLM CLIP 380 4,010 0 28 Jan 2022
Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features Hannah Rashkin David Reitter Gaurav Singh Tomar Dipanjan Das 149 100 0 14 Jul 2021
Zero-Shot Text-to-Image Generation Aditya A. Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen Ilya Sutskever VLM 253 4,735 0 24 Feb 2021
Linguistically-Informed Transformations (LIT): A Method for Automatically Generating Contrast Sets Chuanrong Li Lin Shengshuo Leo Z. Liu Xinyi Wu Xuhui Zhou Shane Steinert-Threlkeld VLM 112 38 0 16 Oct 2020