Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.05945
Cited By
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
9 May 2024
Peng Gao
Le Zhuo
Ziyi Lin
Ruoyi Du
Xu Luo
Longtian Qiu
Yuhang Zhang
Chen Lin
Rongjie Huang
Shijie Geng
Renrui Zhang
Junlin Xi
Wenqi Shao
Zhengkai Jiang
Tianshuo Yang
Weicai Ye
He Tong
Jingwen He
Yu Qiao
Hongsheng Li
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers"
26 / 76 papers shown
Title
Scaling Diffusion Transformers to 16 Billion Parameters
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Debang Li
Junshi Huang
DiffM
MoE
54
15
0
16 Jul 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
43
5
0
11 Jul 2024
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Wanggui He
Siming Fu
Mushui Liu
Xierui Wang
Wenyi Xiao
...
Zhelun Yu
Haoyuan Li
Ziwei Huang
Leilei Gan
Hao Jiang
DiffM
16
20
0
10 Jul 2024
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
Shenghai Yuan
Jinfa Huang
Yongqi Xu
Yaoyang Liu
Shaofeng Zhang
Yujun Shi
Ruijie Zhu
Xinhua Cheng
Jiebo Luo
Li Yuan
EGVM
VGen
66
1
0
26 Jun 2024
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models
Bingqi Ma
Zhuofan Zong
Guanglu Song
Hongsheng Li
Yu Liu
30
19
0
17 Jun 2024
PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction
Danpeng Chen
Hai Li
Weicai Ye
Yifan Wang
Weijian Xie
Shangjin Zhai
Nan Wang
Haomin Liu
Hujun Bao
Guofeng Zhang
3DGS
52
63
0
10 Jun 2024
CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion
Xingrui Wang
Xin Li
Zhibo Chen
DiffM
34
1
0
07 Jun 2024
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
Le Zhuo
Ruoyi Du
Han Xiao
Yangguang Li
Dongyang Liu
...
Wanli Ouyang
Ziwei Liu
Yu Qiao
Hongsheng Li
Peng Gao
44
5
0
05 Jun 2024
HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization
Wenxuan Liu
Saiqian Zhang
MQ
21
5
0
30 May 2024
DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis
Yao Teng
Yue Wu
Han Shi
Xuefei Ning
Guohao Dai
Yu-Xiang Wang
Zhenguo Li
Xihui Liu
Mamba
29
23
0
23 May 2024
TerDiT: Ternary Diffusion Models with Transformers
Xudong Lu
Aojun Zhou
Ziyi Lin
Qi Liu
Yuhui Xu
Renrui Zhang
Yafei Wen
Shuai Ren
Peng Gao
Junchi Yan
MQ
29
2
0
23 May 2024
VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
Junlin Han
Filippos Kokkinos
Philip H. S. Torr
VGen
66
16
0
18 Mar 2024
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Junsong Chen
Chongjian Ge
Enze Xie
Yue Wu
Lewei Yao
Xiaozhe Ren
Zhongdao Wang
Ping Luo
Huchuan Lu
Zhenguo Li
125
85
0
07 Mar 2024
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Ekaterina Deyneka
Hsiang-wei Chao
...
Yuwei Fang
Hsin-Ying Lee
Jian Ren
Ming-Hsuan Yang
Sergey Tulyakov
VGen
67
177
0
29 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
116
106
0
08 Feb 2024
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Haoxin Chen
Yong Zhang
Xiaodong Cun
Menghan Xia
Xintao Wang
Chao-Liang Weng
Ying Shan
VGen
DiffM
115
269
0
17 Jan 2024
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
Shangchen Zhou
Peiqing Yang
Jianyi Wang
Yihang Luo
Chen Change Loy
VGen
85
33
0
11 Dec 2023
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
...
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
150
985
0
25 Nov 2023
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Deepanway Ghosal
Navonil Majumder
Ambuj Mehrish
Soujanya Poria
135
137
0
24 Apr 2023
Stochastic Interpolants: A Unifying Framework for Flows and Diffusions
M. S. Albergo
Nicholas M. Boffi
Eric Vanden-Eijnden
DiffM
240
260
0
15 Mar 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
137
304
0
30 Jan 2023
StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
Axel Sauer
Katja Schwarz
Andreas Geiger
174
485
0
01 Feb 2022
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
A Style-Based Generator Architecture for Generative Adversarial Networks
Tero Karras
S. Laine
Timo Aila
262
10,183
0
12 Dec 2018
Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola
Jun-Yan Zhu
Tinghui Zhou
Alexei A. Efros
SSeg
203
19,191
0
21 Nov 2016
Pixel Recurrent Neural Networks
Aaron van den Oord
Nal Kalchbrenner
Koray Kavukcuoglu
SSeg
GAN
219
2,391
0
25 Jan 2016
Previous
1
2