Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
All Papers
Title
Home
Papers
2505.17955
Cited By
v1
v2 (latest)
Diffusion Classifiers Understand Compositionality, but Conditions Apply
23 May 2025
Yujin Jeong
Arnas Uselis
Seong Joon Oh
Anna Rohrbach
DiffM
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (21 upvotes)
Papers citing
"Diffusion Classifiers Understand Compositionality, but Conditions Apply"
49 / 49 papers shown
Title
Video models are zero-shot learners and reasoners
Thaddäus Wiedemer
Yuxuan Li
Paul Vicol
Shixiang Shane Gu
Nick Matarese
Kevin Swersky
Been Kim
P. Jaini
Robert Geirhos
VLM
LRM
21
0
0
24 Sep 2025
Intermediate Layer Classifiers for OOD generalization
Arnas Uselis
Seong Joon Oh
OOD
473
6
0
07 Apr 2025
Open-Sora: Democratizing Efficient Video Production for All
Zangwei Zheng
Xiangyu Peng
Tianji Yang
Chenhui Shen
Shenggui Li
Hongxin Liu
Yukun Zhou
Tianyi Li
Yang You
VGen
266
323
0
31 Dec 2024
1.58-bit FLUX
Chenglin Yang
Celong Liu
XueQing Deng
Dongwon Kim
Xing Mei
Xiaohui Shen
Liang-Chieh Chen
MQ
115
16
0
24 Dec 2024
A Simple and Efficient Baseline for Zero-Shot Generative Classification
Zipeng Qi
Buhua Liu
Shiyan Zhang
Bao Li
Zhiqiang Xu
Haoyi Xiong
Bo Han
DiffM
VLM
137
3
0
17 Dec 2024
OmniGen: Unified Image Generation
Shitao Xiao
Yueze Wang
Yueze Wang
Huaying Yuan
Xingrun Xing
Ruiran Yan
Shuting Wang
Tiejun Huang
Zheng Liu
DiffM
VLM
SyDa
219
138
0
17 Sep 2024
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong
Yunji Kim
Sanghyuk Chun
Jiyoung Lee
VGen
DiffM
142
21
0
08 Jul 2024
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
Core Francisco Park
Maya Okawa
Andrew Lee
Ekdeep Singh Lubana
Hidenori Tanaka
159
23
0
27 Jun 2024
SUGARCREPE++ Dataset: Vision-Language Model Sensitivity to Semantic and Lexical Alterations
Sri Harsha Dumpala
Aman Jaiswal
Chandramouli Shama Sastry
E. Milios
Sageev Oore
Hassan Sajjad
CoGe
165
18
0
17 Jun 2024
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
...
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
498
1,898
0
05 Mar 2024
Few-shot Learner Parameterization by Diffusion Time-steps
Zhongqi Yue
Pan Zhou
Richang Hong
Hanwang Zhang
Qianru Sun
141
14
0
05 Mar 2024
Fast Timing-Conditioned Latent Audio Diffusion
Zach Evans
CJ Carr
Josiah Taylor
Scott H. Hawley
Jordi Pons
DiffM
242
155
0
07 Feb 2024
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Shengbang Tong
Zhuang Liu
Yuexiang Zhai
Yi-An Ma
Yann LeCun
Saining Xie
VLM
MLLM
260
429
0
11 Jan 2024
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding
Wujian Peng
Sicheng Xie
Zuyao You
Shiyi Lan
Zuxuan Wu
VLM
CoGe
MLLM
195
31
0
30 Nov 2023
Adversarial Diffusion Distillation
Axel Sauer
Dominik Lorenz
A. Blattmann
Robin Rombach
307
468
0
28 Nov 2023
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
...
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
580
1,469
0
25 Nov 2023
The Generative AI Paradox: "What It Can Create, It May Not Understand"
Peter West
Ximing Lu
Nouha Dziri
Faeze Brahman
Linjie Li
...
Khyathi Chandu
Benjamin Newman
Pang Wei Koh
Allyson Ettinger
Yejin Choi
AIMat
154
87
0
31 Oct 2023
What's "up" with vision-language models? Investigating their struggle with spatial reasoning
Amita Kamath
Jack Hessel
Kai-Wei Chang
LRM
CoGe
164
150
0
30 Oct 2023
GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment
Dhruba Ghosh
Hanna Hajishirzi
Ludwig Schmidt
147
301
0
17 Oct 2023
Intriguing properties of generative classifiers
P. Jaini
Kevin Clark
Robert Geirhos
BDL
239
42
0
28 Sep 2023
SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality
Cheng-Yu Hsieh
Jieyu Zhang
Zixian Ma
Aniruddha Kembhavi
Ranjay Krishna
CoGe
149
155
0
26 Jun 2023
The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation
Saurabh Saxena
Charles Herrmann
Junhwa Hur
Abhishek Kar
Mohammad Norouzi
Deqing Sun
David J. Fleet
DiffM
182
106
0
02 Jun 2023
Are Diffusion Models Vision-And-Language Reasoners?
Benno Krojer
Elinor Poole-Dayan
Vikram S. Voleti
Christopher Pal
Siva Reddy
183
16
0
25 May 2023
Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners
Xuehai He
Weixi Feng
Tsu-Jui Fu
Varun Jampani
Arjun Reddy Akula
P. Narayana
Sugato Basu
William Yang Wang
Xinze Wang
DiffM
183
11
0
18 May 2023
COLA: A Benchmark for Compositional Text-to-image Retrieval
Arijit Ray
Filip Radenovic
Abhimanyu Dubey
Bryan A. Plummer
Ranjay Krishna
Kate Saenko
CoGe
VLM
185
44
0
05 May 2023
Your Diffusion Model is Secretly a Zero-Shot Classifier
Alexander C. Li
Mihir Prabhudesai
Shivam Duggal
Ellis L Brown
Deepak Pathak
DiffM
VLM
244
272
0
28 Mar 2023
Text-to-Image Diffusion Models are Zero-Shot Classifiers
Kevin Clark
P. Jaini
DiffM
VLM
172
135
0
27 Mar 2023
Equivariant Similarity for Vision-Language Foundation Models
Tan Wang
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Zhengyuan Yang
Hanwang Zhang
Zicheng Liu
Lijuan Wang
CoGe
149
55
0
25 Mar 2023
Diffusion Models Generate Images Like Painters: an Analytical Theory of Outline First, Details Later
Binxu Wang
John J. Vastola
DiffM
250
29
0
04 Mar 2023
Unleashing Text-to-Image Diffusion Models for Visual Perception
Wenliang Zhao
Yongming Rao
Zuyan Liu
Benlin Liu
Jie Zhou
Jiwen Lu
ObjD
VLM
MDE
280
260
0
03 Mar 2023
Teaching CLIP to Count to Ten
Roni Paiss
Ariel Ephrat
Omer Tov
Shiran Zada
Inbar Mosseri
Michal Irani
Tali Dekel
VLM
CLIP
163
124
0
23 Feb 2023
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
Haohe Liu
Zehua Chen
Yiitan Yuan
Xinhao Mei
Xubo Liu
Danilo Mandic
Wenwu Wang
Mark D. Plumbley
DiffM
324
575
0
29 Jan 2023
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
376
3,951
0
16 Oct 2022
Flow Matching for Generative Modeling
Y. Lipman
Ricky T. Q. Chen
Heli Ben-Hamu
Maximilian Nickel
Matt Le
OOD
419
1,823
0
06 Oct 2022
When and why vision-language models behave like bags-of-words, and what to do about it?
Mert Yuksekgonul
Federico Bianchi
Pratyusha Kalluri
Dan Jurafsky
James Zou
VLM
CoGe
231
458
0
04 Oct 2022
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Xingchao Liu
Chengyue Gong
Qiang Liu
OOD
445
1,360
0
07 Sep 2022
Classifier-Free Diffusion Guidance
Jonathan Ho
Tim Salimans
FaML
242
4,483
0
26 Jul 2022
Elucidating the Design Space of Diffusion-Based Generative Models
Tero Karras
M. Aittala
Timo Aila
S. Laine
DiffM
603
2,293
0
01 Jun 2022
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia
William Chan
Saurabh Saxena
Lala Li
Jay Whang
...
Raphael Gontijo-Lopes
Tim Salimans
Jonathan Ho
David J Fleet
Mohammad Norouzi
VLM
861
6,681
0
23 May 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
596
7,473
0
13 Apr 2022
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
Tristan Thrush
Ryan Jiang
Max Bartolo
Amanpreet Singh
Adina Williams
Douwe Kiela
Candace Ross
CoGe
198
463
0
07 Apr 2022
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
DiffM
887
17,919
0
20 Dec 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
1.4K
34,341
0
26 Feb 2021
Denoising Diffusion Probabilistic Models
Jonathan Ho
Ajay Jain
Pieter Abbeel
DiffM
2.1K
21,297
0
19 Jun 2020
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
1.6K
142,138
0
12 Jun 2017
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
489
2,501
0
20 Dec 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
3.0K
201,685
0
10 Dec 2015
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
2.4K
81,071
0
18 May 2015
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Jascha Narain Sohl-Dickstein
Eric A. Weiss
Niru Maheswaranathan
Surya Ganguli
SyDa
DiffM
888
7,718
0
12 Mar 2015
1