Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2403.03206
Cited By
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
5 March 2024
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
Harry Saini
Yam Levi
Dominik Lorenz
Axel Sauer
Frederic Boesel
Dustin Podell
Tim Dockhorn
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (68 upvotes)
Papers citing
"Scaling Rectified Flow Transformers for High-Resolution Image Synthesis"
50 / 1,227 papers shown
Title
LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer
Yuzhuo Chen
Zehua Ma
Jianhua Wang
Kai Kang
Shunyu Yao
Weiming Zhang
VLM
133
2
0
24 Dec 2025
Highly Efficient Test-Time Scaling for T2I Diffusion Models with Text Embedding Perturbation
Hang Xu
Linjiang Huang
Feng Zhao
DiffM
57
0
0
03 Dec 2025
Taming Camera-Controlled Video Generation with Verifiable Geometry Reward
Zhaoqing Wang
Xiaobo Xia
Zhuolin Bie
Jinlin Liu
Dongdong Yu
Jia-Wang Bian
Changhu Wang
EGVM
VGen
129
0
0
02 Dec 2025
MultiShotMaster: A Controllable Multi-Shot Video Generation Framework
Qinghe Wang
Xiaoyu Shi
Baolu Li
Weikang Bian
Quande Liu
Huchuan Lu
Xintao Wang
Pengfei Wan
Kun Gai
Xu Jia
VGen
194
1
0
02 Dec 2025
YingVideo-MV: Music-Driven Multi-Stage Video Generation
Jiahui Chen
Weida Wang
Runhua Shi
Huan Yang
Chaofan Ding
Zihao Chen
DiffM
VGen
161
0
0
02 Dec 2025
WeMMU: Enhanced Bridging of Vision-Language Models and Diffusion Models via Noisy Query Tokens
Jian Yang
Dacheng Yin
Xiaoxuan He
Y. Li
Fengyun Rao
Jing Lyu
Wei-dong Zhai
Yang Cao
Zheng-Jun Zha
VLM
170
0
0
02 Dec 2025
PGP-DiffSR: Phase-Guided Progressive Pruning for Efficient Diffusion-based Image Super-Resolution
Zhongbao Yang
Jiangxin Dong
Yazhou Yao
Jinhui Tang
Jinshan Pan
120
0
0
02 Dec 2025
DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models
Patrick Kwon
Chen Chen
DiffM
AI4TS
VGen
128
0
0
01 Dec 2025
Generative Video Motion Editing with 3D Point Tracks
Yao-Chih Lee
Zhoutong Zhang
Jiahui Huang
Jui-Hsien Wang
Joon-Young Lee
Jia-Bin Huang
Eli Shechtman
Zhengqi Li
DiffM
VGen
3DPC
221
0
0
01 Dec 2025
Reversible Inversion for Training-Free Exemplar-guided Image Editing
Yuke Li
Lianli Gao
Ji Zhang
Pengpeng Zeng
Lichuan Xiang
Hongkai Wen
Heng Tao Shen
Jingkuan Song
DiffM
84
0
0
01 Dec 2025
FreqEdit: Preserving High-Frequency Features for Robust Multi-Turn Image Editing
Yucheng Liao
Jiajun Liang
Kaiqian Cui
Baoquan Zhao
Haoran Xie
Wei Liu
Qing Li
Xudong Mao
100
0
0
01 Dec 2025
FineGRAIN: Evaluating Failure Modes of Text-to-Image Models with Vision Language Model Judges
Kevin David Hayes
Micah Goldblum
Vikash Sehwag
Gowthami Somepalli
Ashwinee Panda
Tom Goldstein
MLLM
EGVM
224
0
0
01 Dec 2025
Spatiotemporal Pyramid Flow Matching for Climate Emulation
Jeremy Irvin
Jiaqi Han
Z. Wang
Abdulaziz Alharbi
Yufei Zhao
Nomin-Erdene Bayarsaikhan
Daniele Visioni
A. Ng
Duncan Watson-Parris
AI4TS
72
0
0
01 Dec 2025
ResDiT: Evoking the Intrinsic Resolution Scalability in Diffusion Transformers
Yiyang Ma
Feng Zhou
Xuedan Yin
Pu Cao
Yonghao Dang
Jianqin Yin
52
0
0
01 Dec 2025
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
Zhiheng Liu
Weiming Ren
Haozhe Liu
Zijian Zhou
S. Chen
...
Ping Luo
Wei Liu
Tao Xiang
Jonas Schult
Yuren Cong
116
0
0
01 Dec 2025
FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution
Seungho Choi
Jeahun Sung
Jihyong Oh
DiffM
110
0
0
01 Dec 2025
Dynamic-eDiTor: Training-Free Text-Driven 4D Scene Editing with Multimodal Diffusion Transformer
Dong In Lee
Hyungjun Doh
Seunggeun Chi
Runlin Duan
Sangpil Kim
K. Ramani
DiffM
3DGS
VGen
120
0
0
30 Nov 2025
Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards
Qiang Lyu
Z. Chen
C. Wang
Haolin Shi
Shibo Gao
...
Jianlou Si
Fei Ding
Jing Li
Chun Pong Lau
Weiqiang Wang
EGVM
95
0
0
30 Nov 2025
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards
Junyan Ye
Leiqi Zhu
Yuncheng Guo
Dongzhi Jiang
Zilong Huang
Yifan Zhang
Zhiyuan Yan
Haohuan Fu
Conghui He
Weijia Li
EGVM
100
0
0
29 Nov 2025
Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models
Z. Wang
Jie M. Zhang
Shiguang Shan
Xilin Chen
AAML
312
0
0
29 Nov 2025
Vision Bridge Transformer at Scale
Zhenxiong Tan
Zeqing Wang
Xingyi Yang
Songhua Liu
Xinchao Wang
DiffM
64
0
0
28 Nov 2025
Guiding Visual Autoregressive Models through Spectrum Weakening
Chaoyang Wang
Tianmeng Yang
Jingdong Wang
Yunhai Tong
DiffM
124
0
0
28 Nov 2025
Ovis-Image Technical Report
Guo-Hua Wang
Liangfu Cao
Tianyu Cui
Minghao Fu
Xiaohao Chen
...
Jianshan Zhao
Lan Li
Bowen Fu
Jiaqi Liu
Qing-Guo Chen
VLM
448
0
0
28 Nov 2025
GOATex: Geometry & Occlusion-Aware Texturing
Hyunjin Kim
Kunho Kim
Adam Lee
Wonkwang Lee
DiffM
52
0
0
28 Nov 2025
One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer
S. Shi
Jing Xu
Zhihang Li
Chunli Peng
Xiaoda Yang
Lijing Lu
Kai Hu
Jiangning Zhang
DiffM
56
0
0
28 Nov 2025
REVEAL: Reasoning-enhanced Forensic Evidence Analysis for Explainable AI-generated Image Detection
Huangsen Cao
Qin Mei
Zhiheng Li
Yuxi Li
Ying Zhang
...
Zhimeng Zhang
Xin Ding
Yongwei Wang
Jing Lyu
Fei Wu
80
0
0
28 Nov 2025
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
Sinan Du
Jiahao Guo
Bo Li
Shuhao Cui
Zhengzhuo Xu
...
Yongxian Wei
Kun Gai
X. Wang
Kai Wu
C. Yuan
193
0
0
28 Nov 2025
Visual Generation Tuning
Jiahao Guo
Sinan Du
J. Yao
Wenyu Liu
Bo Li
Haoxiang Cao
Kun Gai
C. Yuan
Kai Wu
Xinggang Wang
VLM
245
0
0
28 Nov 2025
Generative Anchored Fields: Controlled Data Generation via Emergent Velocity Fields and Transport Algebra
Deressa Wodajo Deressa
Hannes Mareen
Peter Lambert
Glenn Van Wallendael
36
0
0
27 Nov 2025
Semantic Anchoring for Robust Personalization in Text-to-Image Diffusion Models
Seoyun Yang
Gihoon Kim
Taesup Kim
56
0
0
27 Nov 2025
Adversarial Flow Models
Shanchuan Lin
Ceyuan Yang
Zhijie Lin
Hao Chen
Haoqi Fan
GAN
128
0
0
27 Nov 2025
Designing Instance-Level Sampling Schedules via REINFORCE with James-Stein Shrinkage
Peiyu Yu
Suraj Kothawade
Sirui Xie
Ying Nian Wu
Hongliang Fei
108
0
0
27 Nov 2025
PROMPTMINER: Black-Box Prompt Stealing against Text-to-Image Generative Models via Reinforcement Learning and Fuzz Optimization
Mingzhe Li
Renhao Zhang
Zhiyang Wen
Siqi Pan
Bruno Castro da Silva
Juan Zhai
Shiqing Ma
32
0
0
27 Nov 2025
Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration
M. Yang
Yanming Yang
Chenyi Xu
Chenxi Song
Yufan Zuo
Tong Zhao
Ruibo Li
Chi Zhang
DiffM
108
0
0
27 Nov 2025
MUSE: Manipulating Unified Framework for Synthesizing Emotions in Images via Test-Time Optimization
Yingjie Xia
X. Wang
Jinglei Shi
Vicky Kalogeiton
Jian Yang
EGVM
VGen
518
0
0
26 Nov 2025
Progress by Pieces: Test-Time Scaling for Autoregressive Image Generation
Joonhyung Park
Hyeongwon Jang
Joowon Kim
Eunho Yang
VLM
128
0
0
26 Nov 2025
CaliTex: Geometry-Calibrated Attention for View-Coherent 3D Texture Generation
Chenyu Liu
Hongze Chen
Jingzhi Bao
Lingting Zhu
Runze Zhang
Weikai Chen
Zeyu Hu
Yingda Yin
Keyang Luo
Xin Wang
DiffM
159
0
0
26 Nov 2025
FlowerDance: MeanFlow for Efficient and Refined 3D Dance Generation
Kaixing Yang
Xulong Tang
Ziqiao Peng
X. Zhang
Puwei Wang
Jun He
Hongyan Liu
180
1
0
26 Nov 2025
Deep Parameter Interpolation for Scalar Conditioning
Chicago Y. Park
Michael T. McCann
Cristina Garcia-Cardona
B. Wohlberg
Ulugbek S. Kamilov
AI4CE
269
0
0
26 Nov 2025
MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices
Shuai Zhang
Bao Tang
Siyuan Yu
Yueting Zhu
Jingfeng Yao
Ya Zou
Shanglin Yuan
Li Yu
Wenyu Liu
Xinggang Wang
DiffM
VGen
197
0
0
26 Nov 2025
3MDiT: Unified Tri-Modal Diffusion Transformer for Text-Driven Synchronized Audio-Video Generation
Y. Li
Heyu Si
Federico Landi
Pilar Oplustil Gallegos
Ioannis Koutsoumpas
...
Ruiju Fu
Qi Guo
Xin Jin
Shunyu Liu
Mingli Song
DiffM
VGen
144
0
0
26 Nov 2025
ReDirector: Creating Any-Length Video Retakes with Rotary Camera Encoding
Byeongjun Park
Byung-Hoon Kim
Hyungjin Chung
Jong Chul Ye
VGen
179
0
0
25 Nov 2025
Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning
Guanjie Chen
Shirui Huang
Kai Liu
J. Zhu
Xiaoye Qu
Peng Chen
Yu Cheng
Yifu Sun
172
1
0
25 Nov 2025
HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation
Xiang Wang
Zhifei Zhang
Chentao Song
Zhe Lin
Yuqian Zhou
...
Haitian Zheng
Jason Kuen
Yuehuan Wang
Changxin Gao
Nong Sang
MoE
149
0
0
25 Nov 2025
PromptMoG: Enhancing Diversity in Long-Prompt Image Generation via Prompt Embedding Mixture-of-Gaussian Sampling
Bo-Kai Ruan
Teng-Fang Hsiao
Ling Lo
Yi-Lun Wu
Hong-Han Shuai
DiffM
VLM
169
0
0
25 Nov 2025
CREward: A Type-Specific Creativity Reward Model
Jiyeon Han
Ali Mahdavi-Amiri
Hao Zhang
Haedong Jeong
84
0
0
25 Nov 2025
Restora-Flow: Mask-Guided Image Restoration with Flow Matching
Arnela Hadzic
Franz Thaler
Lea Bogensperger
Simon Johannes Joham
M. Urschler
DiffM
452
0
0
25 Nov 2025
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
Ziheng Ouyang
Yiren Song
Y. Liu
Shihao Zhu
Qibin Hou
Ming-Ming Cheng
Mike Zheng Shou
104
0
0
25 Nov 2025
RubricRL: Simple Generalizable Rewards for Text-to-Image Generation
Xuelu Feng
Yunsheng Li
Ziyu Wan
Zixuan Gao
Junsong Yuan
Dongdong Chen
Chunming Qiao
EGVM
237
0
0
25 Nov 2025
Training-Free Generation of Diverse and High-Fidelity Images via Prompt Semantic Space Optimization
Debin Meng
Chen Jin
Zheng Gao
Yanran Li
Ioannis Patras
Georgios Tzimiropoulos
DiffM
256
0
0
25 Nov 2025
1
2
3
4
...
23
24
25
Next