ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.08402
  4. Cited By
LAION-5B: An open large-scale dataset for training next generation
  image-text models

LAION-5B: An open large-scale dataset for training next generation image-text models

16 October 2022
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
Mehdi Cherti
Theo Coombes
Aarush Katta
Clayton Mullis
Mitchell Wortsman
P. Schramowski
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
    VLM
    MLLM
    CLIP
ArXivPDFHTML

Papers citing "LAION-5B: An open large-scale dataset for training next generation image-text models"

50 / 520 papers shown
Title
SceneWiz3D: Towards Text-guided 3D Scene Composition
SceneWiz3D: Towards Text-guided 3D Scene Composition
Qihang Zhang
Chaoyang Wang
Aliaksandr Siarohin
Peiye Zhuang
Yinghao Xu
Ceyuan Yang
Dahua Lin
Bolei Zhou
Sergey Tulyakov
Hsin-Ying Lee
20
31
0
13 Dec 2023
Stable Rivers: A Case Study in the Application of Text-to-Image
  Generative Models for Earth Sciences
Stable Rivers: A Case Study in the Application of Text-to-Image Generative Models for Earth Sciences
C. Kupferschmidt
A. Binns
K. L. Kupferschmidt
G. W. Taylor
DiffM
11
0
0
13 Dec 2023
4M: Massively Multimodal Masked Modeling
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
39
62
0
11 Dec 2023
Free3D: Consistent Novel View Synthesis without 3D Representation
Free3D: Consistent Novel View Synthesis without 3D Representation
Chuanxia Zheng
Andrea Vedaldi
3DV
37
48
0
07 Dec 2023
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Zhen Li
Mingdeng Cao
Xintao Wang
Zhongang Qi
Ming-Ming Cheng
Ying Shan
DiffM
39
188
0
07 Dec 2023
DemoCaricature: Democratising Caricature Generation with a Rough Sketch
DemoCaricature: Democratising Caricature Generation with a Rough Sketch
Dar-Yen Chen
A. Bhunia
Subhadeep Koley
Aneeshan Sain
Pinaki Nath Chowdhury
Yi-Zhe Song
18
8
0
07 Dec 2023
Understanding (Un)Intended Memorization in Text-to-Image Generative
  Models
Understanding (Un)Intended Memorization in Text-to-Image Generative Models
Ali Naseh
Jaechul Roh
Amir Houmansadr
DiffM
20
6
0
06 Dec 2023
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLM
CLIP
25
82
0
06 Dec 2023
Mitigating Open-Vocabulary Caption Hallucinations
Mitigating Open-Vocabulary Caption Hallucinations
Assaf Ben-Kish
Moran Yanuka
Morris Alper
Raja Giryes
Hadar Averbuch-Elor
MLLM
VLM
16
6
0
06 Dec 2023
DiffusionSat: A Generative Foundation Model for Satellite Imagery
DiffusionSat: A Generative Foundation Model for Satellite Imagery
Samar Khanna
Patrick Liu
Linqi Zhou
Chenlin Meng
Robin Rombach
Marshall Burke
David B. Lobell
Stefano Ermon
22
57
0
06 Dec 2023
Kandinsky 3.0 Technical Report
Kandinsky 3.0 Technical Report
V.Ya. Arkhipkin
Andrei Filatov
Viacheslav Vasilev
Anastasia Maltseva
Said Azizov
Igor Pavlov
Julia Agafonova
Andrey Kuznetsov
Denis Dimitrov
DiffM
28
10
0
06 Dec 2023
FaceStudio: Put Your Face Everywhere in Seconds
FaceStudio: Put Your Face Everywhere in Seconds
Yuxuan Yan
C. Zhang
Rui Wang
Yichao Zhou
Gege Zhang
Pei Cheng
Gang Yu
Bin-Bin Fu
DiffM
24
39
0
05 Dec 2023
Orthogonal Adaptation for Modular Customization of Diffusion Models
Orthogonal Adaptation for Modular Customization of Diffusion Models
Ryan Po
Guandao Yang
Kfir Aberman
Gordon Wetzstein
DiffM
23
26
0
05 Dec 2023
Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images
Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images
Zhuoran Yu
Chenchen Zhu
Sean Culatana
Raghuraman Krishnamoorthi
Fanyi Xiao
Yong Jae Lee
109
14
0
04 Dec 2023
LDM-ISP: Enhancing Neural ISP for Low Light with Latent Diffusion Models
LDM-ISP: Enhancing Neural ISP for Low Light with Latent Diffusion Models
Qiang Wen
Yazhou Xing
Zhefan Rao
Qifeng Chen
DiffM
30
0
0
02 Dec 2023
Dolphins: Multimodal Language Model for Driving
Dolphins: Multimodal Language Model for Driving
Yingzi Ma
Yulong Cao
Jiachen Sun
Marco Pavone
Chaowei Xiao
MLLM
23
49
0
01 Dec 2023
Text-Guided 3D Face Synthesis -- From Generation to Editing
Text-Guided 3D Face Synthesis -- From Generation to Editing
Yunjie Wu
Yapeng Meng
Zhipeng Hu
Lincheng Li
Haoqian Wu
Kun Zhou
Weiwei Xu
Xin Yu
DiffM
48
9
0
01 Dec 2023
Initializing Models with Larger Ones
Initializing Models with Larger Ones
Zhiqiu Xu
Yanjie Chen
Kirill Vishniakov
Yida Yin
Zhiqiang Shen
Trevor Darrell
Lingjie Liu
Zhuang Liu
28
17
0
30 Nov 2023
IMMA: Immunizing text-to-image Models against Malicious Adaptation
IMMA: Immunizing text-to-image Models against Malicious Adaptation
Yijia Zheng
Raymond A. Yeh
30
8
0
30 Nov 2023
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language
  Understanding
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding
Wujian Peng
Sicheng Xie
Zuyao You
Shiyi Lan
Zuxuan Wu
VLM
CoGe
MLLM
21
17
0
30 Nov 2023
Meta Co-Training: Two Views are Better than One
Meta Co-Training: Two Views are Better than One
Jay C. Rothenberger
Dimitrios I. Diochnos
VLM
25
2
0
29 Nov 2023
4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling
4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling
Sherwin Bahmani
Ivan Skorokhodov
Victor Rong
Gordon Wetzstein
Leonidas J. Guibas
Peter Wonka
Sergey Tulyakov
Jeong Joon Park
Andrea Tagliasacchi
David B. Lindell
DiffM
41
103
0
29 Nov 2023
M$^{2}$Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image
  Generation
M2^{2}2Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image Generation
Xiaowei Chi
Rongyu Zhang
Zhengkai Jiang
Yijiang Liu
Ziyi Lin
...
Chaoyou Fu
Peng Gao
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
MLLM
33
1
0
29 Nov 2023
Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines
Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines
Hamed Damirchi
Cristian Rodriguez-Opazo
Ehsan Abbasnejad
Damien Teney
Javen Qinfeng Shi
Stephen Gould
A. Hengel
VLM
29
0
0
29 Nov 2023
Critical Influence of Overparameterization on Sharpness-aware Minimization
Critical Influence of Overparameterization on Sharpness-aware Minimization
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
39
1
0
29 Nov 2023
HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
Xian Liu
Xiaohang Zhan
Jiaxiang Tang
Ying Shan
Gang Zeng
Dahua Lin
Xihui Liu
Ziwei Liu
3DGS
35
72
0
28 Nov 2023
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
Yutong Feng
Biao Gong
Di Chen
Yujun Shen
Yu Liu
Jingren Zhou
DiffM
26
43
0
28 Nov 2023
As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D
  Diffusion Priors
As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors
Seungwoo Yoo
Kunho Kim
Vladimir G. Kim
Minhyuk Sung
DiffM
21
13
0
28 Nov 2023
IG Captioner: Information Gain Captioners are Strong Zero-shot
  Classifiers
IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers
Chenglin Yang
Siyuan Qiao
Yuan Cao
Yu Zhang
Tao Zhu
Alan L. Yuille
Jiahui Yu
VLM
16
3
0
27 Nov 2023
Paragraph-to-Image Generation with Information-Enriched Diffusion Model
Paragraph-to-Image Generation with Information-Enriched Diffusion Model
Weijia Wu
Zhuang Li
Yefei He
Mike Zheng Shou
Chunhua Shen
Lele Cheng
Yan Li
Tingting Gao
Di Zhang
VLM
126
24
0
24 Nov 2023
Using Human Feedback to Fine-tune Diffusion Models without Any Reward
  Model
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Kai Yang
Jian Tao
Jiafei Lyu
Chunjiang Ge
Jiaxin Chen
Qimai Li
Weihan Shen
Xiaolong Zhu
Xiu Li
EGVM
19
87
0
22 Nov 2023
Boosting3D: High-Fidelity Image-to-3D by Boosting 2D Diffusion Prior to
  3D Prior with Progressive Learning
Boosting3D: High-Fidelity Image-to-3D by Boosting 2D Diffusion Prior to 3D Prior with Progressive Learning
Kai Yu
Jinlin Liu
Mengyang Feng
Miaomiao Cui
Xuansong Xie
33
6
0
22 Nov 2023
Nepotistically Trained Generative-AI Models Collapse
Nepotistically Trained Generative-AI Models Collapse
Matyáš Boháček
Hany Farid
46
17
0
20 Nov 2023
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large
  Reconstruction Model
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model
Jiahao Li
Hao Tan
Kai Zhang
Zexiang Xu
Fujun Luan
Yinghao Xu
Yicong Hong
Kalyan Sunkavalli
Greg Shakhnarovich
Sai Bi
43
254
0
10 Nov 2023
PolyMaX: General Dense Prediction with Mask Transformer
PolyMaX: General Dense Prediction with Mask Transformer
Xuan S. Yang
Liangzhe Yuan
Kimberly Wilber
Astuti Sharma
Xiuye Gu
...
Stephanie Debats
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Liang-Chieh Chen
26
14
0
09 Nov 2023
Improved DDIM Sampling with Moment Matching Gaussian Mixtures
Improved DDIM Sampling with Moment Matching Gaussian Mixtures
Prasad Gabbur
DiffM
18
1
0
08 Nov 2023
3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features
3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features
Chenfeng Xu
Huan Ling
Sanja Fidler
Or Litany
8
14
0
07 Nov 2023
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
Yifan Du
Hangyu Guo
Kun Zhou
Wayne Xin Zhao
Jinpeng Wang
Chuyuan Wang
Mingchen Cai
Ruihua Song
Ji-Rong Wen
VLM
MLLM
LRM
57
22
0
02 Nov 2023
Are Natural Domain Foundation Models Useful for Medical Image
  Classification?
Are Natural Domain Foundation Models Useful for Medical Image Classification?
Joana Palés Huix
Adithya Raju Ganeshan
Johan Fredin Haslum
Magnus P Soderberg
Christos Matsoukas
Kevin Smith
OOD
MedIm
VLM
19
30
0
30 Oct 2023
Kiki or Bouba? Sound Symbolism in Vision-and-Language Models
Kiki or Bouba? Sound Symbolism in Vision-and-Language Models
Morris Alper
Hadar Averbuch-Elor
33
10
0
25 Oct 2023
On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
Yixin Wu
Ning Yu
Michael Backes
Yun Shen
Yang Zhang
DiffM
51
8
0
25 Oct 2023
Online Detection of AI-Generated Images
Online Detection of AI-Generated Images
David C. Epstein
Ishan Jain
Oliver Wang
Richard Y. Zhang
27
52
0
23 Oct 2023
Leveraging Image-Text Similarity and Caption Modification for the
  DataComp Challenge: Filtering Track and BYOD Track
Leveraging Image-Text Similarity and Caption Modification for the DataComp Challenge: Filtering Track and BYOD Track
Shuhei Yokoo
Peifei Zhu
Yuchi Ishikawa
Mikihiro Tanaka
Masayoshi Kondo
Hirokatsu Kataoka
16
0
0
23 Oct 2023
Semantic and Expressive Variation in Image Captions Across Languages
Semantic and Expressive Variation in Image Captions Across Languages
Andre Ye
Sebastin Santy
Jena D. Hwang
Amy X. Zhang
Ranjay Krishna
VLM
46
3
0
22 Oct 2023
HyperHuman: Hyper-Realistic Human Generation with Latent Structural
  Diffusion
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion
Xian Liu
Jian Ren
Aliaksandr Siarohin
Ivan Skorokhodov
Yanyu Li
Dahua Lin
Xihui Liu
Ziwei Liu
Sergey Tulyakov
32
57
0
12 Oct 2023
Bucks for Buckets (B4B): Active Defenses Against Stealing Encoders
Bucks for Buckets (B4B): Active Defenses Against Stealing Encoders
Jan Dubiñski
Stanislaw Pawlak
Franziska Boenisch
Tomasz Trzciñski
Adam Dziedzic
AAML
21
3
0
12 Oct 2023
MotionDirector: Motion Customization of Text-to-Video Diffusion Models
MotionDirector: Motion Customization of Text-to-Video Diffusion Models
Rui Zhao
Yuchao Gu
Jay Zhangjie Wu
David Junhao Zhang
Jia-Wei Liu
Weijia Wu
Jussi Keppo
Mike Zheng Shou
DiffM
VGen
25
103
0
12 Oct 2023
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
25
15
0
28 Sep 2023
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Avamarie Brueggeman
Andrea Madotto
Zhaojiang Lin
Tushar Nagarajan
Matt Smith
...
Peyman Heidari
Yue Liu
Kavya Srinet
Babak Damavandi
Anuj Kumar
MLLM
29
92
0
27 Sep 2023
Dynamic Prompt Learning: Addressing Cross-Attention Leakage for
  Text-Based Image Editing
Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing
Kai Wang
Fei Yang
Shiqi Yang
Muhammad Atif Butt
Joost van de Weijer
DiffM
30
51
0
27 Sep 2023
Previous
123...1011789
Next