Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2403.03206
Cited By
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
5 March 2024
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
Harry Saini
Yam Levi
Dominik Lorenz
Axel Sauer
Frederic Boesel
Dustin Podell
Tim Dockhorn
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (68 upvotes)
Papers citing
"Scaling Rectified Flow Transformers for High-Resolution Image Synthesis"
50 / 1,219 papers shown
Title
RAPID^3: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer
Wangbo Zhao
Yizeng Han
Zhiwei Tang
Jiasheng Tang
Pengfei Zhou
Kai Wang
Bohan Zhuang
Zinan Lin
Fan Wang
Yang You
148
1
0
26 Sep 2025
Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation
Abdelrahman Eldesokey
Aleksandar Cvejic
Bernard Ghanem
Peter Wonka
104
0
0
26 Sep 2025
Un-Doubling Diffusion: LLM-guided Disambiguation of Homonym Duplication
Evgeny Kaskov
Elizaveta Petrova
Petr Surovtsev
Anna Kostikova
Ilya Mistiurin
A. Kapitanov
Alexander Nagaev
DiffM
269
0
0
25 Sep 2025
Evaluating the Evaluators: Metrics for Compositional Text-to-Image Generation
S. Kasaei
Ali Aghayari
Arash Marioriyad
Niki Sepasian
MohammadAmin Fazli
Mahdieh Soleymani Baghshah
M. Rohban
EGVM
207
0
0
25 Sep 2025
FerretNet: Efficient Synthetic Image Detection via Local Pixel Dependencies
Shuqiao Liang
Jian Liu
Renzhang Chen
Quanlong Guan
141
2
0
25 Sep 2025
Does FLUX Already Know How to Perform Physically Plausible Image Composition?
Shilin Lu
Zhuming Lian
Zihan Zhou
Shaocong Zhang
Chen Zhao
A. Kong
262
11
0
25 Sep 2025
SD3.5-Flash: Distribution-Guided Distillation of Generative Flows
Hmrishav Bandyopadhyay
Rahim Entezari
Jim Scott
Reshinth Adithyan
Yi-Zhe Song
Varun Jampani
285
1
0
25 Sep 2025
ThinkFake: Reasoning in Multimodal Large Language Models for AI-Generated Image Detection
Tai-Ming Huang
Wei-Tung Lin
Kai-Lung Hua
Wen-Huang Cheng
Junichi Yamagishi
Jun-Cheng Chen
OffRL
LRM
116
3
0
24 Sep 2025
InstructVTON: Optimal Auto-Masking and Natural-Language-Guided Interactive Style Control for Inpainting-Based Virtual Try-On
Julien Han
Shuwen Qiu
Qi Li
Xingzi Xu
M. S. Seyfioglu
Kavosh Asadi
Karim Bouyarmane
DiffM
136
3
0
24 Sep 2025
MultiSoundGen: Video-to-Audio Generation for Multi-Event Scenarios via SlowFast Contrastive Audio-Visual Pretraining and Direct Preference Optimization
Jianxuan Yang
Xiaoran Yang
Lipan Zhang
Xinyue Guo
Zhao Wang
Gongping Huang
VGen
134
0
0
24 Sep 2025
Efficient Encoder-Free Pose Conditioning and Pose Control for Virtual Try-On
Qi Li
Shuwen Qiu
Julien Han
Xingzi Xu
M. S. Seyfioglu
Kee Kiat Koo
Karim Bouyarmane
3DH
172
1
0
24 Sep 2025
OmniBridge: Unified Multimodal Understanding, Generation, and Retrieval via Latent Space Alignment
Teng Xiao
Zuchao Li
Lefei Zhang
165
0
0
23 Sep 2025
SimpleFold: Folding Proteins is Simpler than You Think
Yuyang Wang
Jiarui Lu
Navdeep Jaitly
J. Susskind
Miguel Angel Bautista
184
5
0
23 Sep 2025
CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching
Chen Chen
Pengsheng Guo
Liangchen Song
Jiasen Lu
Rui Qian
Xinze Wang
Tsu-Jui Fu
Wei Liu
Yinfei Yang
Alex Schwing
DiffM
OOD
100
0
0
23 Sep 2025
OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps
Bingnan Li
Chen Wang
Haiyang Xu
Xiang Zhang
Ethan Armand
Divyansh Srivastava
Xiaojun Shan
Zeyuan Chen
Jianwen Xie
Zhuowen Tu
VLM
118
1
0
23 Sep 2025
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Sudhanshu Agrawal
Risheek Garrepalli
Raghavv Goel
Mingu Lee
Christopher Lott
Fatih Porikli
202
6
0
22 Sep 2025
Seg4Diff: Unveiling Open-Vocabulary Segmentation in Text-to-Image Diffusion Transformers
Chaehyun Kim
Heeseong Shin
Eunbeen Hong
Heeji Yoon
Anurag Arnab
Paul Hongsuck Seo
Sunghwan Hong
Seungryong Kim
172
6
0
22 Sep 2025
ComposeMe: Attribute-Specific Image Prompts for Controllable Human Image Generation
Guocheng Qian
Daniil Ostashev
Egor Nemchinov
Avihay Assouline
Sergey Tulyakov
Kuan-Chien Wang
Kfir Aberman
DiffM
182
5
0
22 Sep 2025
Multi-scale Temporal Prediction via Incremental Generation and Multi-agent Collaboration
Zhitao Zeng
Guojian Yuan
Junyuan Mao
Yuxuan Wang
Xiaoshuang Jia
Yueming Jin
204
0
0
22 Sep 2025
Stencil: Subject-Driven Generation with Context Guidance
International Conference on Information Photonics (ICIP), 2025
Gordon Chen
Ziqi Huang
Cheston Tan
Ziwei Liu
DiffM
110
0
0
21 Sep 2025
VCE: Safe Autoregressive Image Generation via Visual Contrast Exploitation
Feng Han
Chao Gong
Zhipeng Wei
Yue Yu
Yu Jiang
DiffM
134
0
0
21 Sep 2025
InstanceAssemble: Layout-Aware Image Generation via Instance Assembling Attention
Qiang Xiang
Shuang Sun
Binglei Li
Dejia Song
Huaxia Li
Nemo Chen
Xu Tang
Yao Hu
Junping Zhang
DiffM
256
0
0
20 Sep 2025
FG-Attn: Leveraging Fine-Grained Sparsity In Diffusion Transformers
Sankeerth Durvasula
Kavya Sreedhar
Zain Moustafa
Suraj Kothawade
Ashish Gondimalla
Suvinay Subramanian
Narges Shahidi
Nandita Vijaykumar
VGen
98
0
0
20 Sep 2025
FakeChain: Exposing Shallow Cues in Multi-Step Deepfake Detection
Minji Heo
Simon S. Woo
129
1
0
20 Sep 2025
ArchesClimate: Probabilistic Decadal Ensemble Generation With Flow Matching
Graham Clyne
Guillaume Couairon
Guillaume Gastineau
C. Monteleoni
A. Charantonis
BDL
36
0
0
19 Sep 2025
SAGE: Semantic-Aware Shared Sampling for Efficient Diffusion
Haoran Zhao
Tong Bai
Lei Huang
Xiaoyu Liang
DiffM
MedIm
55
0
0
19 Sep 2025
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification
Zinan Lin
Enshu Liu
Xuefei Ning
Junyi Zhu
Wenyu Wang
Sergey Yekhanin
AI4CE
213
0
0
19 Sep 2025
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Yanghao Li
Rui Qian
Bowen Pan
Haotian Zhang
Haoshuo Huang
...
Zhengdong Zhang
Chen Chen
Yang Zhao
Ruoming Pang
Zhifeng Chen
MLLM
200
4
0
19 Sep 2025
AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models
Vatsal Malaviya
Agneet Chatterjee
Maitreya Patel
Yezhou Yang
Chitta Baral
84
0
0
19 Sep 2025
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Kaiwen Zheng
Huayu Chen
Haotian Ye
Haoxiang Wang
Qinsheng Zhang
Kai Jiang
Hang Su
Stefano Ermon
Jun Zhu
Ming-Yu Liu
224
9
0
19 Sep 2025
PolyJuice Makes It Real: Black-Box, Universal Red Teaming for Synthetic Image Detectors
Sepehr Dehdashtian
Mashrur M. Morshed
Jacob H. Seidman
Gaurav Bharaj
Vishnu Boddeti
AAML
DiffM
128
0
0
19 Sep 2025
Lynx: Towards High-Fidelity Personalized Video Generation
S. Sang
Tiancheng Zhi
Tianpei Gu
Jing Liu
Linjie Luo
DiffM
VGen
192
3
0
19 Sep 2025
Radiology Report Conditional 3D CT Generation with Multi Encoder Latent diffusion Model
Sina Amirrajab
Zohaib Salahuddin
Sheng Kuang
Henry C. Woodruff
Philippe Lambin
DiffM
MedIm
100
0
0
18 Sep 2025
MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks
Mingsong Li
Lin Liu
Hongjun Wang
Haoxing Chen
Xijun Gu
Shizhan Liu
Dong Gong
Junbo Zhao
Zhenzhong Lan
Jianguo Li
133
0
0
18 Sep 2025
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation
Yuming Jiang
Siteng Huang
Shengke Xue
Yaxi Zhao
Jun Cen
...
Kexiang Wang
Mingxiu Chen
F. Wang
Deli Zhao
Xin Li
VGen
LM&Ro
71
8
0
18 Sep 2025
FlowCast-ODE: Continuous Hourly Weather Forecasting with Dynamic Flow Matching and ODE Solver
Shuangshuang He
Yuanting Zhang
Hongli Liang
Qingye Meng
Xingyuan Yuan
Shuo Wang
145
0
0
18 Sep 2025
LowDiff: Efficient Diffusion Sampling with Low-Resolution Condition
Jiuyi Xu
Qing Jin
Meida Chen
Andrew Feng
Yang Sui
Yangming Shi
DiffM
87
0
0
18 Sep 2025
AToken: A Unified Tokenizer for Vision
Jiasen Lu
Liangchen Song
Mingze Xu
Byeongjoo Ahn
Yanjun Wang
Chen Chen
Afshin Dehghan
Yinfei Yang
ViT
212
7
0
17 Sep 2025
BiasMap: Leveraging Cross-Attentions to Discover and Mitigate Hidden Social Biases in Text-to-Image Generation
Rajatsubhra Chakraborty
Xujun Che
Depeng Xu
Cori Faklaris
Xi Niu
Shuhan Yuan
84
0
0
16 Sep 2025
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence
Zixin Yin
Xili Dai
Duomin Wang
Xianfang Zeng
Lionel M. Ni
Gang Yu
H. Shum
DiffM
133
1
0
15 Sep 2025
Beyond Sliders: Mastering the Art of Diffusion-based Image Manipulation
Yufei Tang
Daiheng Gao
Pingyu Wu
Wenbo Zhou
Bang Zhang
Weiming Zhang
DiffM
132
0
0
14 Sep 2025
TrueSkin: Towards Fair and Accurate Skin Tone Recognition and Generation
Haoming Lu
SyDa
94
1
0
13 Sep 2025
MagicMirror: A Large-Scale Dataset and Benchmark for Fine-Grained Artifacts Assessment in Text-to-Image Generation
Jia Wang
Jie Hu
Xiaoqi Ma
Hanghang Ma
Yanbing Zeng
Xiaoming Wei
EGVM
VGen
167
0
0
12 Sep 2025
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
Rongyao Fang
Aldrich Yu
Chengqi Duan
Linjiang Huang
S. Bai
Yuxuan Cai
Kun Wang
Si Liu
Xihui Liu
Xue Yang
EGVM
VGen
ReLM
LRM
206
7
0
11 Sep 2025
Unified Multimodal Model as Auto-Encoder
Zhiyuan Yan
Kaiqing Lin
Zongjian Li
Junyan Ye
Hui Han
...
Xue Xu
Xinyan Xiao
Jingdong Wang
Haifeng Wang
Li Yuan
298
1
0
11 Sep 2025
Integrating Anatomical Priors into a Causal Diffusion Model
Binxu Li
Wei Peng
Mingjie Li
Ehsan Adeli
K. Pohl
DiffM
MedIm
121
0
0
10 Sep 2025
RewardDance: Reward Scaling in Visual Generation
Jie Wu
Yu Gao
Zilyu Ye
Ming Li
Liang Li
...
Zeyue Xue
Xiaoxia Hou
Wei Liu
Yan Zeng
Weilin Huang
EGVM
209
15
0
10 Sep 2025
RaC: Robot Learning for Long-Horizon Tasks by Scaling Recovery and Correction
Zheyuan Hu
Robyn Wu
Naveen Enock
Jasmine Li
Riya Kadakia
Zackory Erickson
Aviral Kumar
104
6
0
09 Sep 2025
ANYPORTAL: Zero-Shot Consistent Video Background Replacement
Wenshuo Gao
Xicheng Lan
Shuai Yang
DiffM
VGen
112
1
0
09 Sep 2025
SplatFill: 3D Scene Inpainting via Depth-Guided Gaussian Splatting
Mahtab Dahaghin
Milind G. Padalkar
M. Toso
Alessio Del Bue
3DGS
100
0
0
09 Sep 2025
Previous
1
2
3
...
6
7
8
...
23
24
25
Next