Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.05945
Cited By
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
9 May 2024
Peng Gao
Le Zhuo
Ziyi Lin
Ruoyi Du
Xu Luo
Longtian Qiu
Yuhang Zhang
Chen Lin
Rongjie Huang
Shijie Geng
Renrui Zhang
Junlin Xi
Wenqi Shao
Zhengkai Jiang
Tianshuo Yang
Weicai Ye
He Tong
Jingwen He
Yu Qiao
Hongsheng Li
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers"
31 / 31 papers shown
Title
T2S: High-resolution Time Series Generation with Text-to-Series Diffusion Models
Yunfeng Ge
Jiawei Li
Yiji Zhao
Haomin Wen
Zhao Li
M. Qiu
H. Li
Ming Jin
Shirui Pan
DiffM
24
0
0
05 May 2025
Versatile Framework for Song Generation with Prompt-based Control
Y. Zhang
Wenxiang Guo
Changhao Pan
Z. Zhu
Ruiqi Li
...
Rongjie Huang
Ruiyuan Zhang
Zhiqing Hong
Ziyue Jiang
Zhou Zhao
68
1
0
27 Apr 2025
OmniCaptioner: One Captioner to Rule Them All
Yiting Lu
Jiakang Yuan
Zhen Li
Shitian Zhao
Qi Qin
...
Lei Bai
Zhibo Chen
Peng Gao
Bo Zhang
Peng Gao
MLLM
76
0
0
09 Apr 2025
Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision
Yuandong Pu
Le Zhuo
Kaiwen Zhu
Liangbin Xie
Wenlong Zhang
Xiangyu Chen
Peng Gao
Yu Qiao
Chao Dong
Yihao Liu
MLLM
55
1
0
07 Apr 2025
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation
Chenguo Lin
Panwang Pan
Bangbang Yang
Zeming Li
Yadong Mu
3DGS
55
7
0
28 Jan 2025
Multi-subject Open-set Personalization in Video Generation
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Yuwei Fang
Kwot Sin Lee
Ivan Skorokhodov
Kfir Aberman
Jun-Yan Zhu
Ming Yang
Sergey Tulyakov
DiffM
VGen
62
7
0
10 Jan 2025
Towards Precise Scaling Laws for Video Diffusion Transformers
Yuanyang Yin
Yaqi Zhao
Mingwu Zheng
Ke Lin
Jiarong Ou
...
Pengfei Wan
Di Zhang
Baoqun Yin
Wentao Zhang
Kun Gai
110
2
0
03 Jan 2025
Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation
Quan Dao
Hao Phung
T. Dao
Dimitris Metaxas
Anh Tran
64
1
0
22 Dec 2024
Video Diffusion Transformers are In-Context Learners
Zhengcong Fei
Di Qiu
Changqian Yu
Debang Li
Mingyuan Fan
VGen
DiffM
95
2
0
14 Dec 2024
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Gaoxiang Cong
Jiadong Pan
Liang-Sheng Li
Yuankai Qi
Yuxin Peng
A. Hengel
Jian Yang
Qingming Huang
90
6
0
12 Dec 2024
Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models
Salma Abdel Magid
Weiwei Pan
Simon Warchol
Grace Guo
Junsik Kim
Mahia Rahman
Hanspeter Pfister
84
0
0
06 Oct 2024
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions
Y. Wang
Hangting Chen
Dongchao Yang
Zhiyong Wu
Xixin Wu
DiffM
33
2
0
19 Sep 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Dongyang Liu
Shitian Zhao
Le Zhuo
Weifeng Lin
Yu Qiao
Xinyue Li
Qi Qin
Yu Qiao
Hongsheng Li
Peng Gao
MLLM
49
48
0
05 Aug 2024
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Sherwin Bahmani
Ivan Skorokhodov
Aliaksandr Siarohin
Willi Menapace
Guocheng Qian
...
Chaoyang Wang
Jiaxu Zou
Andrea Tagliasacchi
David B. Lindell
Sergey Tulyakov
VGen
DiffM
62
41
0
17 Jul 2024
PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction
Danpeng Chen
Hai Li
Weicai Ye
Yifan Wang
Weijian Xie
Shangjin Zhai
Nan Wang
Haomin Liu
Hujun Bao
Guofeng Zhang
3DGS
52
63
0
10 Jun 2024
TerDiT: Ternary Diffusion Models with Transformers
Xudong Lu
Aojun Zhou
Ziyi Lin
Qi Liu
Yuhui Xu
Renrui Zhang
Yafei Wen
Shuai Ren
Peng Gao
Junchi Yan
MQ
27
2
0
23 May 2024
VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
Junlin Han
Filippos Kokkinos
Philip H. S. Torr
VGen
63
16
0
18 Mar 2024
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Junsong Chen
Chongjian Ge
Enze Xie
Yue Wu
Lewei Yao
Xiaozhe Ren
Zhongdao Wang
Ping Luo
Huchuan Lu
Zhenguo Li
125
85
0
07 Mar 2024
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Ekaterina Deyneka
Hsiang-wei Chao
...
Yuwei Fang
Hsin-Ying Lee
Jian Ren
Ming-Hsuan Yang
Sergey Tulyakov
VGen
67
177
0
29 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
116
106
0
08 Feb 2024
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Haoxin Chen
Yong Zhang
Xiaodong Cun
Menghan Xia
Xintao Wang
Chao-Liang Weng
Ying Shan
VGen
DiffM
115
269
0
17 Jan 2024
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
Shangchen Zhou
Peiqing Yang
Jianyi Wang
Yihang Luo
Chen Change Loy
VGen
79
33
0
11 Dec 2023
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
...
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
150
985
0
25 Nov 2023
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Deepanway Ghosal
Navonil Majumder
Ambuj Mehrish
Soujanya Poria
135
137
0
24 Apr 2023
Stochastic Interpolants: A Unifying Framework for Flows and Diffusions
M. S. Albergo
Nicholas M. Boffi
Eric Vanden-Eijnden
DiffM
240
260
0
15 Mar 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
137
304
0
30 Jan 2023
StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
Axel Sauer
Katja Schwarz
Andreas Geiger
174
354
0
01 Feb 2022
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
A Style-Based Generator Architecture for Generative Adversarial Networks
Tero Karras
S. Laine
Timo Aila
262
10,183
0
12 Dec 2018
Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola
Jun-Yan Zhu
Tinghui Zhou
Alexei A. Efros
SSeg
203
19,191
0
21 Nov 2016
Pixel Recurrent Neural Networks
Aaron van den Oord
Nal Kalchbrenner
Koray Kavukcuoglu
SSeg
GAN
219
2,391
0
25 Jan 2016
1