Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.08402
Cited By
LAION-5B: An open large-scale dataset for training next generation image-text models
16 October 2022
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
Mehdi Cherti
Theo Coombes
Aarush Katta
Clayton Mullis
Mitchell Wortsman
P. Schramowski
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LAION-5B: An open large-scale dataset for training next generation image-text models"
50 / 502 papers shown
Title
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Donghoon Kim
Minji Bae
Kyuhong Shim
B. Shim
26
0
0
13 May 2025
Diffusion Model Quantization: A Review
Qian Zeng
Chenggong Hu
Mingli Song
Jie Song
MQ
41
0
0
08 May 2025
FLAM: Frame-Wise Language-Audio Modeling
Yusong Wu
Christos Tsirigotis
Ke Chen
Cheng-Zhi Anna Huang
Aaron C. Courville
Oriol Nieto
Prem Seetharaman
Justin Salamon
43
0
0
08 May 2025
FG-CLIP: Fine-Grained Visual and Textual Alignment
Chunyu Xie
Bin Wang
Fanjing Kong
Jincheng Li
Dawei Liang
Gengshen Zhang
Dawei Leng
Yuhui Yin
CLIP
VLM
42
0
0
08 May 2025
CRAFT: Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation
Viacheslav Vasilev
V. Arkhipkin
Julia Agafonova
Tatiana Nikulina
Evelina Mironova
Alisa Shichanina
Nikolai Gerasimenko
Mikhail Shoytov
Denis Dimitrov
43
0
0
07 May 2025
Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Autospeculation
Hengyuan Hu
Aniket Das
Dorsa Sadigh
Nima Anari
DiffM
19
0
0
06 May 2025
Panoramic Out-of-Distribution Segmentation
Mengfei Duan
Kailun Yang
Y. Zhang
Yihong Cao
Fei Teng
Kai Luo
Jiaming Zhang
Zhiyong Li
Shutao Li
52
0
0
06 May 2025
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models
Abram Schonfeldt
Benjamin Maylor
Xiaofang Chen
Ronald Clark
Aiden Doherty
64
0
0
06 May 2025
Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability
L. Wang
Senmao Li
Fei Yang
Jianye Wang
Ziheng Zhang
Y. Liu
Y. Wang
Jian Yang
DiffM
56
0
0
06 May 2025
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Biao Gong
Cheng Zou
Dandan Zheng
Hu Yu
Jingdong Chen
...
Qingpei Guo
Rui Liu
Weilong Chai
Xinyu Xiao
Ziyuan Huang
MLLM
76
1
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
62
0
0
05 May 2025
Incentivizing Inclusive Contributions in Model Sharing Markets
Enpei Zhang
Jingyi Chai
Rui Ye
Yanfeng Wang
Siheng Chen
TDI
FedML
87
0
0
05 May 2025
MVHumanNet++: A Large-scale Dataset of Multi-view Daily Dressing Human Captures with Richer Annotations for 3D Human Digitization
Chenghong Li
Hongjie Liao
Yihao Zhi
Xihe Yang
Zhengwentai Sun
Jiahao Chang
Shuguang Cui
Xiaoguang Han
3DH
47
0
0
03 May 2025
Where's the liability in the Generative Era? Recovery-based Black-Box Detection of AI-Generated Content
Haoyue Bai
Yiyou Sun
Wei Cheng
Haifeng Chen
AAML
51
0
0
02 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma
Luoxin Ye
Nessa McWeeney
Celso M de Melo
A. Yuille
Jieneng Chen
LRM
57
1
0
01 May 2025
AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images
Yunhao Li
Sijing Wu
Wei Sun
Zhichao Zhang
Yucheng Zhu
Zicheng Zhang
Huiyu Duan
Xiongkuo Min
Guangtao Zhai
EGVM
81
0
0
30 Apr 2025
Erased but Not Forgotten: How Backdoors Compromise Concept Erasure
Jonas Henry Grebe
Tobias Braun
Marcus Rohrbach
Anna Rohrbach
AAML
77
0
0
29 Apr 2025
Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers
Quentin Guimard
Moreno DÍncà
Massimiliano Mancini
Elisa Ricci
SSL
72
0
0
29 Apr 2025
YoChameleon: Personalized Vision and Language Generation
Thao Nguyen
Krishna Kumar Singh
Jing Shi
Trung H. Bui
Yong Jae Lee
Yuheng Li
MLLM
82
0
0
29 Apr 2025
Boosting 3D Liver Shape Datasets with Diffusion Models and Implicit Neural Representations
K. T. Nguyen
Francesca Tozzi
W. Willaert
J. Vankerschaver
Nikdokht Rashidian
W. D. Neve
DiffM
MedIm
75
0
0
28 Apr 2025
EcoWikiRS: Learning Ecological Representation of Satellite Images from Weak Supervision with Species Observations and Wikipedia
Valerie Zermatten
J. Castillo-Navarro
Pallavi Jain
D. Tuia
Diego Marcos
57
0
0
28 Apr 2025
SynergyAmodal: Deocclude Anything with Text Control
Xinyang Li
Chengjie Yi
Jiawei Lai
Mingbao Lin
Yansong Qu
Shengchuan Zhang
Liujuan Cao
DiffM
73
0
0
28 Apr 2025
REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models
Gal Almog
Ariel Shamir
Ohad Fried
DiffM
53
0
0
26 Apr 2025
Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models
C. L. P. Chen
Daochang Liu
M. Shah
Chang Xu
57
1
0
25 Apr 2025
VistaDepth: Frequency Modulation With Bias Reweighting For Enhanced Long-Range Depth Estimation
Mingxia Zhan
Li Zhang
Xiaomeng Chu
Beibei Wang
MDE
57
0
0
21 Apr 2025
Backdoor Defense in Diffusion Models via Spatial Attention Unlearning
Abha Jha
Ashwath Vaithinathan Aravindan
Matthew Salaway
Atharva Sandeep Bhide
Duygu Nur Yaldiz
AAML
70
0
0
21 Apr 2025
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams
C. Kim
Jihwan Moon
Sangwoo Moon
Heeseung Yun
Sihaeng Lee
Aniruddha Kembhavi
Soonyoung Lee
Gunhee Kim
Sangho Lee
Christopher Clark
23
0
0
21 Apr 2025
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Kaihang Pan
Wang Lin
Zhongqi Yue
Tenglong Ao
Liyu Jia
Wei Zhao
Juncheng Billy Li
Siliang Tang
Hanwang Zhang
42
1
0
20 Apr 2025
How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?
Rahul Thapa
Andrew Li
Qingyang Wu
B. He
Yuki Sahashi
...
Angela Zhang
Ben Athiwaratkun
S. Song
David Ouyang
James Y. Zou
LM&MA
45
0
0
19 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
0
0
17 Apr 2025
NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results
Xin Li
Kun Yuan
B. Li
Fengbin Guan
Yizhen Shao
...
Guohua Zhang
Z. Huang
Y. Deng
Qingmiao Jiang
Lu Chen
53
7
0
17 Apr 2025
Cobra: Efficient Line Art COlorization with BRoAder References
Junhao Zhuang
Lingen Li
Xuan Ju
Zhaoyang Zhang
C. Yuan
Ying Shan
DiffM
62
0
0
16 Apr 2025
Negate or Embrace: On How Misalignment Shapes Multimodal Representation Learning
Yichao Cai
Yuhang Liu
Erdun Gao
T. Jiang
Zhen Zhang
Anton van den Hengel
J. Shi
55
0
0
14 Apr 2025
From Visual Explanations to Counterfactual Explanations with Latent Diffusion
Tung Luu
Nam Le
Duc Le
Bac Le
DiffM
AAML
FAtt
44
0
0
12 Apr 2025
Kimi-VL Technical Report
Kimi Team
Angang Du
B. Yin
Bowei Xing
Bowen Qu
...
Zhiqi Huang
Zihao Huang
Zijia Zhao
Z. Chen
Zongyu Lin
MLLM
VLM
MoE
137
1
0
10 Apr 2025
MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data
Paul Borne--Pons
Mikolaj Czerkawski
Rosalie Martin
Romain Rouffet
DiffM
19
2
0
09 Apr 2025
D-Feat Occlusions: Diffusion Features for Robustness to Partial Visual Occlusions in Object Recognition
Rupayan Mallick
Sibo Dong
Nataniel Ruiz
Sarah Adel Bargal
DiffM
44
0
0
08 Apr 2025
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models
Justus Westerhoff
Erblina Purellku
Jakob Hackstein
Jonas Loos
Leo Pinetzki
Lorenz Hufe
AAML
28
0
0
07 Apr 2025
Deconstructing Bias: A Multifaceted Framework for Diagnosing Cultural and Compositional Inequities in Text-to-Image Generative Models
Muna Numan Said
Aarib Zaidi
Rabia Usman
Sonia Okon
Praneeth Medepalli
Kevin Zhu
Vasu Sharma
Sean O'Brien
22
0
0
05 Apr 2025
Large (Vision) Language Models are Unsupervised In-Context Learners
Artyom Gadetsky
Andrei Atanov
Yulun Jiang
Zhitong Gao
Ghazal Hosseini Mighan
Amir Zamir
Maria Brbić
VLM
MLLM
LRM
64
0
0
03 Apr 2025
Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities
Raman Dutt
Harleen Hanspal
Guoxuan Xia
Petru-Daniel Tudosiu
Alexander Black
Yongxin Yang
Steven G. McDonagh
Sarah Parisot
MoE
38
0
0
28 Mar 2025
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu
W. Zhang
Lumin Xu
Sheng Jin
Zhonghua Wu
Qingyi Tao
Wentao Liu
Wei Li
Chen Change Loy
VGen
94
2
0
27 Mar 2025
SyncSDE: A Probabilistic Framework for Diffusion Synchronization
Hyunjun Lee
Hyunsoo Lee
Sookwan Han
DiffM
46
0
0
27 Mar 2025
ICE: Intrinsic Concept Extraction from a Single Image via Diffusion Models
Fernando Julio Cendra
Kai Han
VLM
51
0
0
25 Mar 2025
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
Jinjin Zhang
Qiuyu Huang
Junjie Liu
Xiefan Guo
Di Huang
54
1
0
24 Mar 2025
Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models
Ketan Suhaas Saichandran
Xavier Thomas
Prakhar Kaushik
Deepti Ghadiyaram
DiffM
73
0
0
22 Mar 2025
UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models
Fanghua Yu
Jinjin Gu
Jinfan Hu
Zheyuan Li
Chao Dong
DiffM
50
0
0
21 Mar 2025
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Jinlong Li
Cristiano Saltori
Fabio Poiesi
N. Sebe
103
0
0
20 Mar 2025
How to Train Your Dragon: Automatic Diffusion-Based Rigging for Characters with Diverse Topologies
Zeqi Gu
Difan Liu
Timothy Langlois
Matthew Fisher
Abe Davis
DiffM
3DH
60
0
0
19 Mar 2025
DPImageBench: A Unified Benchmark for Differentially Private Image Synthesis
Chen Gong
Kecen Li
Zinan Lin
Tianhao Wang
47
3
0
18 Mar 2025
1
2
3
4
...
9
10
11
Next