Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.03206
Cited By
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
5 March 2024
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
Harry Saini
Yam Levi
Dominik Lorenz
Axel Sauer
Frederic Boesel
Dustin Podell
Tim Dockhorn
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Rectified Flow Transformers for High-Resolution Image Synthesis"
50 / 796 papers shown
Title
Can Video Diffusion Model Reconstruct 4D Geometry?
Jinjie Mai
Wenxuan Zhu
Haozhe Liu
Bing Li
Cheng Zheng
Jürgen Schmidhuber
Bernard Ghanem
VGen
MDE
70
0
0
27 Mar 2025
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
Shitian Zhao
Qilong Wu
Xinyue Li
Bo Zhang
Ming-xing Li
...
H. Li
Yu Qiao
Peng Gao
Bin Fu
Zhen Li
EGVM
43
0
0
27 Mar 2025
Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance
Jaywon Koo
J. Hernandez
Moayed Haji-Ali
Ziyan Yang
Vicente Ordonez
EGVM
67
0
0
27 Mar 2025
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Dian Zheng
Ziqi Huang
Hongbo Liu
Kai Zou
Yinan He
...
Y. Zhang
Jingwen He
Wei-Shi Zheng
Yu Qiao
Ziwei Liu
EGVM
VGen
48
3
0
27 Mar 2025
Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data
Zhiyuan Ma
Xinyue Liang
Rongyuan Wu
Xiangyu Zhu
Zhen Lei
Lei Zhang
71
0
0
27 Mar 2025
Optimal Stepsize for Diffusion Sampling
Jianning Pei
Han Hu
Shuyang Gu
41
0
0
27 Mar 2025
Efficient Multi-Instance Generation with Janus-Pro-Dirven Prompt Parsing
Fan Qi
Yu Duan
Changsheng Xu
DiffM
47
0
0
27 Mar 2025
Vision-to-Music Generation: A Survey
Zhaokai Wang
Chenxi Bao
Le Zhuo
Jingrui Han
Yang Yue
Yihong Tang
Victor Shea-Jay Huang
Yue Liao
EGVM
VGen
74
1
0
27 Mar 2025
DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation
Haoyu Zhao
Zhongang Qi
Cong Wang
Qingping Zheng
Guansong Lu
Fei Chen
Hang Xu
Zuxuan Wu
DiffM
VGen
46
0
0
27 Mar 2025
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu
W. Zhang
Lumin Xu
Sheng Jin
Zhonghua Wu
Qingyi Tao
Wentao Liu
Wei Li
Chen Change Loy
VGen
64
2
0
27 Mar 2025
Forensic Self-Descriptions Are All You Need for Zero-Shot Detection, Open-Set Source Attribution, and Clustering of AI-generated Images
Tai D. Nguyen
Aref Azizpour
Matthew C. Stamm
46
1
0
26 Mar 2025
Unified Multimodal Discrete Diffusion
Alexander Swerdlow
Mihir Prabhudesai
Siddharth Gandhi
Deepak Pathak
Katerina Fragkiadaki
DiffM
68
0
0
26 Mar 2025
Synthetic Video Enhances Physical Fidelity in Video Synthesis
Qi Zhao
Xingyu Ni
Ziyu Wang
Feng Cheng
Ziyan Yang
Lu Jiang
Bohan Wang
VGen
41
2
0
26 Mar 2025
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
Yuyang Peng
Shishi Xiao
Keming Wu
Qisheng Liao
Bohan Chen
Kevin Lin
Danqing Huang
Ji Li
Yuhui Yuan
DiffM
66
1
0
26 Mar 2025
Video Motion Graphs
Haiyang Liu
Zhan Xu
Fa-Ting Hong
Hsin-Ping Huang
Yi Zhou
Yang Zhou
DiffM
VGen
88
0
0
26 Mar 2025
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
Alex Jinpeng Wang
Linjie Li
Z. Yang
Lijuan Wang
Min Li
DiffM
68
0
0
26 Mar 2025
EditCLIP: Representation Learning for Image Editing
Qian Wang
Aleksandar Cvejic
Abdelrahman Eldesokey
Peter Wonka
61
0
0
26 Mar 2025
IPGO: Indirect Prompt Gradient Optimization on Text-to-Image Generative Models with High Data Efficiency
Jianping Ye
Michel Wedel
Kunpeng Zhang
37
0
0
25 Mar 2025
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing
Jaihoon Kim
Taehoon Yoon
Jisung Hwang
Minhyuk Sung
DiffM
51
1
0
25 Mar 2025
LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration
Yuyao Zhang
Jinghao Li
Yu-Wing Tai
DiffM
64
0
0
25 Mar 2025
Scaling Down Text Encoders of Text-to-Image Diffusion Models
Lifu Wang
Daqing Liu
Xinchen Liu
Xiaodong He
VLM
38
0
0
25 Mar 2025
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
Jiazhi Guan
Kaisiyuan Wang
Zhiliang Xu
Quanwei Yang
Yasheng Sun
...
Errui Ding
J. Wang
Youjian Zhao
Hang Zhou
Ziwei Liu
VGen
44
0
0
25 Mar 2025
FullDiT: Multi-Task Video Generative Foundation Model with Full Attention
Xuan Ju
Weicai Ye
Quande Liu
Qiulin Wang
Xintao Wang
Pengfei Wan
Di Zhang
Kun Gai
Qiang Xu
VGen
39
1
0
25 Mar 2025
ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
Jiaqi Liao
Z. Yang
Linjie Li
Dianqi Li
Kevin Qinghong Lin
Yu-Xi Cheng
Lijuan Wang
MLLM
LRM
57
0
0
25 Mar 2025
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
Jun Zhou
J. Li
Zunnan Xu
Hanhui Li
Yiji Cheng
Fa-Ting Hong
Qin Lin
Qinglin Lu
Xiaodan Liang
DiffM
62
1
0
25 Mar 2025
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Yuchao Gu
Weijia Mao
Mike Zheng Shou
VGen
71
1
0
25 Mar 2025
Panorama Generation From NFoV Image Done Right
Dian Zheng
Cheng Zhang
Xiao-Ming Wu
Cao Li
Chengfei Lv
Jian-Fang Hu
Wei-Shi Zheng
DiffM
79
0
0
24 Mar 2025
From Fragment to One Piece: A Survey on AI-Driven Graphic Design
Xingxing Zou
Wen Zhang
Nanxuan Zhao
54
0
0
24 Mar 2025
Boosting Resolution Generalization of Diffusion Transformers with Randomized Positional Encodings
Cong Liu
Liang Hou
Mingwu Zheng
Xin Tao
Pengfei Wan
Di Zhang
Kun Gai
46
0
0
24 Mar 2025
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
Jinjin Zhang
Qiuyu Huang
Junjie Liu
Xiefan Guo
Di Huang
51
1
0
24 Mar 2025
RomanTex: Decoupling 3D-aware Rotary Positional Embedded Multi-Attention Network for Texture Synthesis
Yifei Feng
M. Yang
S. M. I. Simon X. Yang
Sheng Zhang
J. Yu
Zibo Zhao
Yuhong Liu
Jie Jiang
Chunchao Guo
DiffM
56
0
0
24 Mar 2025
U-REPA: Aligning Diffusion U-Nets to ViTs
Yuchuan Tian
Hanting Chen
Mengyu Zheng
Yuchen Liang
Chao Xu
Yunhe Wang
54
0
0
24 Mar 2025
CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models
Weichen Fan
Amber Yijia Zheng
Raymond A. Yeh
Ziwei Liu
46
1
0
24 Mar 2025
Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning
Sherry X Chen
Misha Sra
Pradeep Sen
50
0
0
24 Mar 2025
Training-free Diffusion Acceleration with Bottleneck Sampling
Ye Tian
Xin Xia
Yuxi Ren
Shanchuan Lin
Xing Wang
Xuefeng Xiao
Yunhai Tong
L. Yang
Bin Cui
56
0
0
24 Mar 2025
Target-Aware Video Diffusion Models
Taeksoo Kim
Hanbyul Joo
DiffM
VGen
89
1
0
24 Mar 2025
TCFG: Tangential Damping Classifier-free Guidance
Mingi Kwon
Shin seong Kim
Jaeseok Jeong. Yi Ting Hsiao
Youngjung Uh
DiffM
58
0
0
23 Mar 2025
Serial Low-rank Adaptation of Vision Transformer
Houqiang Zhong
Shaocheng Shen
Ke Cai
Zhenglong Wu
Jiangchao Yao
Yuan Cheng
Xuefei Li
Xiaoyun Zhang
Li-Na Song
Qiang Hu
37
0
0
22 Mar 2025
TDRI: Two-Phase Dialogue Refinement and Co-Adaptation for Interactive Image Generation
Yuheng Feng
Jianhui Wang
Kun Li
Sida Li
Tianyu Shi
Haoyue Han
Miao Zhang
Xueqian Wang
DiffM
53
0
0
22 Mar 2025
Guidance Free Image Editing via Explicit Conditioning
Mehdi Noroozi
Alberto Gil C. P. Ramos
Luca Morreale
Ruchika Chavhan
Malcolm Chadwick
Abhinav Mehrotra
Sourav Bhattacharya
DiffM
56
0
0
22 Mar 2025
ARFlow: Human Action-Reaction Flow Matching with Physical Guidance
Wentao Jiang
Jingya Wang
Haotao Lu
Kaiyang Ji
Baoxiong Jia
Siyuan Huang
Ye-ling Shi
39
0
0
21 Mar 2025
UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models
Fanghua Yu
Jinjin Gu
Jinfan Hu
Zheyuan Li
Chao Dong
DiffM
50
0
0
21 Mar 2025
What's Producible May Not Be Reachable: Measuring the Steerability of Generative Models
Keyon Vafa
Sarah Bentley
Jon M. Kleinberg
S. Mullainathan
38
0
0
21 Mar 2025
Halton Scheduler For Masked Generative Image Transformer
Victor Besnier
Mickael Chen
David Hurych
Eduardo Valle
Matthieu Cord
47
1
0
21 Mar 2025
Zero-Shot Styled Text Image Generation, but Make It Autoregressive
Vittorio Pippi
Fabio Quattrini
S. Cascianelli
Alessio Tonioni
Rita Cucchiara
37
0
0
21 Mar 2025
Scale-wise Distillation of Diffusion Models
Nikita Starodubcev
Denis Kuznedelev
Artem Babenko
Dmitry Baranchuk
DiffM
48
0
0
20 Mar 2025
BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers
Hui Zhang
Tingwei Gao
Jie Shao
Zuxuan Wu
64
0
0
20 Mar 2025
A Recipe for Generating 3D Worlds From a Single Image
Katja Schwarz
Denys Rozumnyi
Samuel Rota Buló
Lorenzo Porzi
Peter Kontschieder
VGen
74
1
0
20 Mar 2025
FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing
Tianyi Wei
Yifan Zhou
Dongdong Chen
Xingang Pan
72
0
0
20 Mar 2025
EDiT: Efficient Diffusion Transformers with Linear Compressed Attention
Philipp Becker
Abhinav Mehrotra
Ruchika Chavhan
Malcolm Chadwick
Luca Morreale
Mehdi Noroozi
Alberto Gil C. P. Ramos
Sourav Bhattacharya
43
0
0
20 Mar 2025
Previous
1
2
3
4
5
...
14
15
16
Next