No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency
Determines Multimodal Model Performance

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

4 April 2024

Vishaal Udandarao

Philip H. S. Torr

Matthias Bethge

Papers citing "No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance"

18 / 18 papers shown

Title
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models Abram Schonfeldt Benjamin Maylor Xiaofang Chen Ronald Clark Aiden Doherty 58 0 0 06 May 2025
A Large-Scale Vision-Language Dataset Derived from Open Scientific Literature to Advance Biomedical Generalist AI Alejandro Lozano M. W. Sun James Burgess Jeffrey Nirschl Christopher Polzak ... Xiaohan Wang Alfred Seunghoon Song Chiang Chia-Chun Robert Tibshirani Serena Yeung-Levy LM&MA 51 1 0 26 Mar 2025
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding Kung-Hsiang Huang Can Qin Haoyi Qiu Philippe Laban Shafiq R. Joty Caiming Xiong C. Wu VLM 61 1 0 17 Feb 2025
Audio-Language Datasets of Scenes and Events: A Survey Gijs Wijngaard Elia Formisano Michele Esposito M. Dumontier 61 2 0 10 Jan 2025
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better Scott Geng Cheng-Yu Hsieh Vivek Ramanujan Matthew Wallingford Chun-Liang Li Pang Wei Koh Ranjay Krishna DiffM 29 6 0 03 Jan 2025
Estimating Causal Effects of Text Interventions Leveraging LLMs Siyi Guo Myrl G. Marmarelis Fred Morstatter Kristina Lerman CML 36 0 0 28 Oct 2024
SECURE: Semantics-aware Embodied Conversation under Unawareness for Lifelong Robot Learning Rimvydas Rubavicius Peter David Fagan A. Lascarides Subramanian Ramamoorthy LM&Ro 19 0 0 26 Sep 2024
SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training? Hasan Hammoud Hani Itani Fabio Pizzati Philip H. S. Torr Adel Bibi Bernard Ghanem CLIP VLM 107 34 0 02 Feb 2024
From Categories to Classifier: Name-Only Continual Learning by Exploring the Web Ameya Prabhu Hasan Hammoud Ser-Nam Lim Bernard Ghanem Philip H. S. Torr Adel Bibi CLL 94 9 0 19 Nov 2023
Holistic Evaluation of Text-To-Image Models Tony Lee Michihiro Yasunaga Chenlin Meng Yifan Mai Joon Sung Park ... Jun-Yan Zhu Fei-Fei Li Jiajun Wu Stefano Ermon Percy Liang 128 124 0 07 Nov 2023
Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4 Kent K. Chang Mackenzie Cramer Sandeep Soni David Bamman RALM 135 77 0 28 Apr 2023
Generating images of rare concepts using pre-trained diffusion models Dvir Samuel Rami Ben-Ari Simon Raviv N. Darshan Gal Chechik 120 38 0 27 Apr 2023
Towards Foundation Models and Few-Shot Parameter-Efficient Fine-Tuning for Volumetric Organ Segmentation Julio Silva-Rodríguez Jose Dolz Ismail Ben Ayed 39 12 0 29 Mar 2023
CyCLIP: Cyclic Contrastive Language-Image Pretraining Shashank Goel Hritik Bansal S. Bhatia Ryan A. Rossi Vishwa Vinay Aditya Grover CLIP VLM 151 131 0 28 May 2022
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models Jaemin Cho Abhaysinh Zala Mohit Bansal ViT 121 167 0 08 Feb 2022
Deduplicating Training Data Makes Language Models Better Katherine Lee Daphne Ippolito A. Nystrom Chiyuan Zhang Douglas Eck Chris Callison-Burch Nicholas Carlini SyDa 234 447 0 14 Jul 2021
Zero-Shot Text-to-Image Generation Aditya A. Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen Ilya Sutskever VLM 253 4,735 0 24 Feb 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts Soravit Changpinyo P. Sharma Nan Ding Radu Soricut VLM 273 845 0 17 Feb 2021