Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2403.03206
Cited By
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
5 March 2024
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
Harry Saini
Yam Levi
Dominik Lorenz
Axel Sauer
Frederic Boesel
Dustin Podell
Tim Dockhorn
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (68 upvotes)
Papers citing
"Scaling Rectified Flow Transformers for High-Resolution Image Synthesis"
50 / 1,247 papers shown
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation
Yuming Jiang
Siteng Huang
Shengke Xue
Yaxi Zhao
Jun Cen
...
Kexiang Wang
Mingxiu Chen
F. Wang
Deli Zhao
Xin Li
VGen
LM&Ro
95
8
0
18 Sep 2025
LowDiff: Efficient Diffusion Sampling with Low-Resolution Condition
Jiuyi Xu
Qing Jin
Meida Chen
Andrew Feng
Yang Sui
Yangming Shi
DiffM
156
0
0
18 Sep 2025
Radiology Report Conditional 3D CT Generation with Multi Encoder Latent diffusion Model
Sina Amirrajab
Zohaib Salahuddin
Sheng Kuang
Henry C. Woodruff
Philippe Lambin
DiffM
MedIm
129
0
0
18 Sep 2025
FlowCast-ODE: Continuous Hourly Weather Forecasting with Dynamic Flow Matching and ODE Solver
Shuangshuang He
Yuanting Zhang
Hongli Liang
Qingye Meng
Xingyuan Yuan
Shuo Wang
174
0
0
18 Sep 2025
MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks
Mingsong Li
Lin Liu
Hongjun Wang
Haoxing Chen
Xijun Gu
Shizhan Liu
Dong Gong
Junbo Zhao
Zhenzhong Lan
Jianguo Li
144
0
0
18 Sep 2025
AToken: A Unified Tokenizer for Vision
Jiasen Lu
Liangchen Song
Mingze Xu
Byeongjoo Ahn
Yanjun Wang
Chen Chen
Afshin Dehghan
Yinfei Yang
ViT
236
7
0
17 Sep 2025
BiasMap: Leveraging Cross-Attentions to Discover and Mitigate Hidden Social Biases in Text-to-Image Generation
Rajatsubhra Chakraborty
Xujun Che
Depeng Xu
Cori Faklaris
Xi Niu
Shuhan Yuan
109
0
0
16 Sep 2025
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence
Zixin Yin
Xili Dai
Duomin Wang
Xianfang Zeng
Lionel M. Ni
Gang Yu
H. Shum
DiffM
208
1
0
15 Sep 2025
Beyond Sliders: Mastering the Art of Diffusion-based Image Manipulation
Yufei Tang
Daiheng Gao
Pingyu Wu
Wenbo Zhou
Bang Zhang
Weiming Zhang
DiffM
151
0
0
14 Sep 2025
TrueSkin: Towards Fair and Accurate Skin Tone Recognition and Generation
Haoming Lu
SyDa
110
1
0
13 Sep 2025
MagicMirror: A Large-Scale Dataset and Benchmark for Fine-Grained Artifacts Assessment in Text-to-Image Generation
Jia Wang
Jie Hu
Xiaoqi Ma
Hanghang Ma
Yanbing Zeng
Xiaoming Wei
EGVM
VGen
194
1
0
12 Sep 2025
Unified Multimodal Model as Auto-Encoder
Zhiyuan Yan
Kaiqing Lin
Zongjian Li
Junyan Ye
Hui Han
...
Xue Xu
Xinyan Xiao
Jingdong Wang
Haifeng Wang
Li Yuan
326
1
0
11 Sep 2025
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
Rongyao Fang
Aldrich Yu
Chengqi Duan
Linjiang Huang
S. Bai
Yuxuan Cai
Kun Wang
Si Liu
Xihui Liu
Xue Yang
EGVM
VGen
ReLM
LRM
227
14
0
11 Sep 2025
RewardDance: Reward Scaling in Visual Generation
Jie Wu
Yu Gao
Zilyu Ye
Ming Li
Liang Li
...
Zeyue Xue
Xiaoxia Hou
Wei Liu
Yan Zeng
Weilin Huang
EGVM
218
20
0
10 Sep 2025
Integrating Anatomical Priors into a Causal Diffusion Model
Binxu Li
Wei Peng
Mingjie Li
Ehsan Adeli
K. Pohl
DiffM
MedIm
142
0
0
10 Sep 2025
Universal Few-Shot Spatial Control for Diffusion Models
Kiet T. Nguyen
Chanhuyk Lee
Donggyun Kim
Dong Hoon Lee
Seunghoon Hong
109
0
0
09 Sep 2025
ANYPORTAL: Zero-Shot Consistent Video Background Replacement
Wenshuo Gao
Xicheng Lan
Shuai Yang
DiffM
VGen
141
1
0
09 Sep 2025
Testing chatbots on the creation of encoders for audio conditioned image generation
Jorge E. León
Miguel Carrasco
156
0
0
09 Sep 2025
RaC: Robot Learning for Long-Horizon Tasks by Scaling Recovery and Correction
Zheyuan Hu
Robyn Wu
Naveen Enock
Jasmine Li
Riya Kadakia
Zackory Erickson
Aviral Kumar
122
8
0
09 Sep 2025
SplatFill: 3D Scene Inpainting via Depth-Guided Gaussian Splatting
Mahtab Dahaghin
Milind G. Padalkar
M. Toso
Alessio Del Bue
3DGS
126
0
0
09 Sep 2025
Reconstruction Alignment Improves Unified Multimodal Models
Ji Xie
Trevor Darrell
Luke Zettlemoyer
Xudong Wang
218
15
0
08 Sep 2025
MeanFlow-Accelerated Multimodal Video-to-Audio Synthesis via One-Step Generation
Xiaoran Yang
Jianxuan Yang
Xinyue Guo
Haoyu Wang
Ningning Pan
Gongping Huang
VGen
129
0
0
08 Sep 2025
LLaDA-VLA: Vision Language Diffusion Action Models
Yuqing Wen
Hebei Li
Kefan Gu
Yucheng Zhao
Tiancai Wang
Xiaoyan Sun
VLM
207
8
0
08 Sep 2025
UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward
Yufeng Cheng
Wenxu Wu
Shaojin Wu
Mengqi Huang
Fei Ding
Qian He
104
6
0
08 Sep 2025
Interleaving Reasoning for Better Text-to-Image Generation
Wenxuan Huang
Shuang Chen
Zheyong Xie
Shaosheng Cao
Shixiang Tang
...
Z. Yin
Juil Sock
Yu Cheng
Wanli Ouyang
Shaohui Lin
244
11
0
08 Sep 2025
Coefficients-Preserving Sampling for Reinforcement Learning with Flow Matching
Feng Wang
Zihao Yu
DiffM
255
12
0
07 Sep 2025
Moment- and Power-Spectrum-Based Gaussianity Regularization for Text-to-Image Models
Jisung Hwang
Jaihoon Kim
Minhyuk Sung
127
0
0
07 Sep 2025
DreamAudio: Customized Text-to-Audio Generation with Diffusion Models
Yi Yuan
Xubo Liu
Haohe Liu
Xiyuan Kang
Zhuo Chen
Yuping Wang
Mark D. Plumbley
Wenwu Wang
DiffM
136
1
0
07 Sep 2025
Effectively obtaining acoustic, visual and textual data from videos
Jorge E. León
Miguel Carrasco
VGen
135
1
0
06 Sep 2025
Diffusion Secant Alignment for Score-Based Density Ratio Estimation
Wei Chen
Shigui Li
Jiacheng Li
Jian Xu
Zhiqi Lin
Junmei Yang
Delu Zeng
John Paisley
Qibin Zhao
188
0
0
05 Sep 2025
FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies
Moritz Reuss
Hongyi Zhou
Marcel Rühle
Ömer Erdinç Yagmurlu
Fabian Otto
Rudolf Lioutikov
LM&Ro
VLM
162
16
0
05 Sep 2025
Skywork UniPic 2.0: Building Kontext Model with Online RL for Unified Multimodal Model
Hongyang Wei
Baixin Xu
Hongbo Liu
Cyrus Wu
J. Liu
...
Ying He
Yang Liu
Xuchen Song
Eric Li
Y. Zhou
182
12
0
04 Sep 2025
Transition Models: Rethinking the Generative Learning Objective
Z. Wang
Yiyuan Zhang
Xiaoyu Yue
Xiangyu Yue
Yangguang Li
Wanli Ouyang
Lei Bai
159
10
0
04 Sep 2025
Hyper Diffusion Avatars: Dynamic Human Avatar Generation using Network Weight Space Diffusion
Dongliang Cao
Guoxing Sun
Marc Habermann
Florian Bernard
204
1
0
04 Sep 2025
Plotñ Polish: Zero-shot Story Visualization and Disentangled Editing with Text-to-Image Diffusion Models
Kiymet Akdemir
Jing Shi
Kushal Kafle
Brian L. Price
Pinar Yanardag
DiffM
129
0
0
04 Sep 2025
PromptEnhancer: A Simple Approach to Enhance Text-to-Image Models via Chain-of-Thought Prompt Rewriting
Linqing Wang
Ximing Xing
Yiji Cheng
Zhiyuan Zhao
Donghao Li
...
Chunyu Wang
Xinchi Deng
S. Gu
C. Wang
Qinglin Lu
363
10
0
04 Sep 2025
MEPG:Multi-Expert Planning and Generation for Compositionally-Rich Image Generation
Yuan Zhao
Lin Liu
DiffM
MoE
191
0
0
04 Sep 2025
OneCAT: Decoder-Only Auto-Regressive Model for Unified Understanding and Generation
Han Li
Xinyu Peng
Y. Wang
Zelin Peng
Xin Chen
Rongxiang Weng
Jingang Wang
Xunliang Cai
Wenrui Dai
Hongkai Xiong
MLLM
OffRL
364
12
0
03 Sep 2025
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
Ouxiang Li
Yuan Wang
Xinting Hu
Huijuan Huang
Rui Chen
Jiarong Ou
Xin Tao
Pengfei Wan
Xiaojuan Qi
Fuli Feng
EGVM
CoGe
LRM
315
6
0
03 Sep 2025
Distribution estimation via Flow Matching with Lipschitz guarantees
Lea Kunkel
DiffM
130
2
0
02 Sep 2025
MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement
D. She
Siming Fu
Mushui Liu
Qiaoqiao Jin
Hualiang Wang
Mu Liu
Jidong Jiang
121
2
0
02 Sep 2025
InfoScale: Unleashing Training-free Variable-scaled Image Generation via Effective Utilization of Information
Guohui Zhang
Jiangtong Tan
Linjiang Huang
Zhonghang Yuan
Naishan Zheng
Jie Huang
Feng Zhao
301
0
0
01 Sep 2025
ManiFlow: A General Robot Manipulation Policy via Consistency Flow Training
Ge Yan
Jiyue Zhu
Yuquan Deng
Shiqi Yang
Ri-Zhao Qiu
...
Marius Memmel
Ranjay Krishna
Ankit Goyal
Xiaolong Wang
Dieter Fox
145
6
0
01 Sep 2025
FantasyHSI: Video-Generation-Centric 4D Human Synthesis In Any Scene through A Graph-based Multi-Agent Framework
Lingzhou Mu
Qiang Wang
Fan Jiang
Mengchao Wang
Yaqi Fan
Mu Xu
Kai Zhang
VGen
152
0
0
01 Sep 2025
GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation
Zhengqiang Zhang
Rongyuan Wu
Lingchen Sun
Lei Zhang
277
2
0
01 Sep 2025
Delta Velocity Rectified Flow for Text-to-Image Editing
Gaspard Beaudouin
Minghan Li
Jaeyeon Kim
Sung-Hoon Yoon
Mengyu Wang
DiffM
233
1
0
01 Sep 2025
Identity-Preserving Text-to-Video Generation via Training-Free Prompt, Image, and Guidance Enhancement
Jiayi Gao
Changcheng Hua
Qingchao Chen
Yuxin Peng
Yang Liu
DiffM
VGen
141
2
0
01 Sep 2025
Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation
Xuechao Zou
Shun Zhang
Xing Fu
Y. Li
Kai Li
Yushe Cao
Congyan Lang
Pin Tao
Junliang Xing
DiffM
191
0
0
30 Aug 2025
Domain Generalization in-the-Wild: Disentangling Classification from Domain-Aware Representations
Ha Min Son
Zhe Zhao
Shahbaz Rezaei
Xin Liu
233
0
0
29 Aug 2025
Reusing Computation in Text-to-Image Diffusion for Efficient Generation of Image Sets
Dale Decatur
Thibault Groueix
Wang Yifan
Rana Hanocka
Vladimir G. Kim
Matheus Gadelha
137
0
0
28 Aug 2025
Previous
1
2
3
...
7
8
9
...
23
24
25
Next