Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2403.03206
Cited By
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
5 March 2024
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
Harry Saini
Yam Levi
Dominik Lorenz
Axel Sauer
Frederic Boesel
Dustin Podell
Tim Dockhorn
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (68 upvotes)
Papers citing
"Scaling Rectified Flow Transformers for High-Resolution Image Synthesis"
50 / 1,247 papers shown
DPSeg: Dual-Prompt Cost Volume Learning for Open-Vocabulary Semantic Segmentation
Computer Vision and Pattern Recognition (CVPR), 2025
Ziyu Zhao
Xiaoguang Li
Linjia Shi
Nasrin Imanpour
Song Wang
VLM
241
2
0
16 May 2025
DDAE++: Enhancing Diffusion Models Towards Unified Generative and Discriminative Learning
Weilai Xiang
Hongyu Yang
Di Huang
Yunhong Wang
447
3
0
16 May 2025
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Computer Vision and Pattern Recognition (CVPR), 2025
Bingda Tang
Boyang Zheng
Xichen Pan
Sayak Paul
Saining Xie
280
9
0
15 May 2025
Path Gradients after Flow Matching
Lorenz Vaitl
Leon Klein
313
1
0
15 May 2025
Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios
Huafeng Shi
Jianzhong Liang
Rongchang Xie
Xian Wu
Cheng Chen
Chang Liu
VGen
375
0
0
14 May 2025
Fast Text-to-Audio Generation with Adversarial Post-Training
Cheng-i Wang
Zach Evans
Zack Zukowski
Josiah Taylor
CJ Carr
...
Adnan Al-Sinan
Gian Marco Iodice
Julian McAuley
Taylor Berg-Kirkpatrick
Jordi Pons
516
8
0
13 May 2025
DanceGRPO: Unleashing GRPO on Visual Generation
Zeyue Xue
Jie Wu
Yu Gao
Fangyuan Kong
Lingting Zhu
...
Zhiheng Liu
Wei Liu
Qiushan Guo
Weilin Huang
Ping Luo
EGVM
VGen
540
144
0
12 May 2025
Improving Trajectory Stitching with Flow Models
Reece O'Mahoney
Wanming Yu
Ioannis Havoutis
417
0
0
12 May 2025
FLUXSynID: A Framework for Identity-Controlled Synthetic Face Generation with Document and Live Images
Raul Ismayilov
Dzemila Sero
Luuk Spreeuwers
487
1
0
12 May 2025
You Only Look One Step: Accelerating Backpropagation in Diffusion Sampling with Gradient Shortcuts
Hongkun Dou
Zeyu Li
Xingyu Jiang
Haoyang Li
Lijun Yang
Wen Yao
Yue Deng
DiffM
529
0
0
12 May 2025
H
3
^3
3
DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning
Yiyang Lu
Yufeng Tian
Zhecheng Yuan
Xinyu Wang
Pu Hua
Zhengrong Xue
Huazhe Xu
397
4
0
12 May 2025
Addressing degeneracies in latent interpolation for diffusion models
Scandinavian Conference on Image Analysis (SCIA), 2025
Erik Landolsi
Fredrik Kahl
DiffM
312
0
0
12 May 2025
Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition
Computer Vision and Pattern Recognition (CVPR), 2025
Zhiyuan Chen
Keyi Li
Yifan Jia
Le Ye
Yufei Ma
DiffM
312
5
0
09 May 2025
From Pixels to Perception: Interpretable Predictions via Instance-wise Grouped Feature Selection
Moritz Vandenhirtz
Julia E. Vogt
401
1
0
09 May 2025
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
Jae-Won Chung
Jiachen Liu
Jeff J. Ma
Jiachen Liu
Oh Jun Kweon
Yuxuan Xia
Zhiyu Wu
Mosharaf Chowdhury
672
8
0
09 May 2025
Flow-GRPO: Training Flow Matching Models via Online RL
Jie Liu
Gongye Liu
Jiajun Liang
Yongqian Li
Jiaheng Liu
Xinyu Wang
Pengfei Wan
Di Zhang
Wanli Ouyang
AI4CE
829
178
0
08 May 2025
Does CLIP perceive art the same way we do?
Andrea Asperti
Leonardo Dessì
Maria Chiara Tonetti
Nico Wu
342
1
0
08 May 2025
InstanceGen: Image Generation with Instance-level Instructions
Etai Sella
Yanir Kleiman
Hadar Averbuch-Elor
424
4
0
08 May 2025
Defining and Quantifying Creative Behavior in Popular Image Generators
Aditi Ramaswamy
Hana Chockler
Melane Navaratnarajah
233
0
0
07 May 2025
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Teng Hu
Zhentao Yu
Zhengguang Zhou
Sen Liang
Yuan Zhou
Qin Lin
Qinglin Lu
DiffM
VGen
515
37
0
07 May 2025
Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Divyansh Srivastava
Xiang Zhang
He Wen
Chenru Wen
Zhuowen Tu
DiffM
271
5
0
07 May 2025
FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios
Shiyi Zhang
Junhao Zhuang
Zhaoyang Zhang
Ying Shan
Yansong Tang
VGen
351
13
0
06 May 2025
FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing
Rui Lan
Y. Bai
Xu Duan
Mingxing Li
Dongyang Jin
Xiaowen Chu
Lei Sun
Lei-huan Sun
Xiangxiang Chu
DiffM
983
15
0
06 May 2025
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Biao Gong
Cheng Zou
Dandan Zheng
Hu Yu
Jingdong Chen
...
Qingpei Guo
Rui Liu
Weilong Chai
Xinyu Xiao
Ziyuan Huang
MLLM
572
10
0
05 May 2025
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
Ming Li
Xin Gu
Fan Chen
X. Xing
Longyin Wen
Chong Chen
Sijie Zhu
DiffM
454
12
0
05 May 2025
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
Dengyang Jiang
Mengmeng Wang
Liuzhuozheng Li
Lei Zhang
Haoyu Wang
Wei Wei
Guang Dai
Yanning Zhang
Jingdong Wang
DiffM
541
16
0
05 May 2025
T2S: High-resolution Time Series Generation with Text-to-Series Diffusion Models
International Joint Conference on Artificial Intelligence (IJCAI), 2025
Yunfeng Ge
Jiawei Li
Yiji Zhao
Haomin Wen
Zhao Li
M. Qiu
Haoyang Li
Ming Jin
Xiaojun Jia
DiffM
770
6
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
1.1K
32
0
05 May 2025
VSC: Visual Search Compositional Text-to-Image Diffusion Model
Do Huu Dat
Nam Hyeonu
Po Yuan Mao
Tae-Hyun Oh
DiffM
CoGe
286
2
0
02 May 2025
Improving Editability in Image Generation with Layer-wise Memory
Computer Vision and Pattern Recognition (CVPR), 2025
Daneul Kim
Jaeah Lee
Jaesik Park
DiffM
KELM
297
1
0
02 May 2025
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
Kwon Byung-Ki
Jingdong Sun
Lee Hyoseok
Chong Luo
Tae-Hyun Oh
654
4
0
01 May 2025
Multi-Modal Language Models as Text-to-Image Model Evaluators
Jiahui Chen
Candace Ross
Reyhane Askari Hemmat
Koustuv Sinha
Melissa Hall
M. Drozdzal
Adriana Romero-Soriano
EGVM
387
1
0
01 May 2025
Nexus-Gen: Unified Image Understanding, Generation, and Editing via Prefilled Autoregression in Shared Embedding Space
Hong Zhang
Zhongjie Duan
Xingjun Wang
Yuze Zhao
Weiyi Lu
Zhipeng Di
Yongjun Xu
Yingda Chen
Yu Zhang
MLLM
524
6
0
30 Apr 2025
ReVision: Refining Video Diffusion with Explicit 3D Motion Modeling
Qihao Liu
Ju He
Qihang Yu
Liang-Chieh Chen
Alan Yuille
DiffM
VGen
511
5
0
30 Apr 2025
PRISM-DP: Spatial Pose-based Observations for Diffusion-Policies via Segmentation, Mesh Generation, and Pose Tracking
Xiatao Sun
Yinxing Chen
Daniel Rakita
VGen
393
5
0
29 Apr 2025
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Computer Vision and Pattern Recognition (CVPR), 2025
Yuanchen Wu
Lu Zhang
Hang Yao
Junlong Du
Ke Yan
Shouhong Ding
Yunsheng Wu
Xuzhao Li
MLLM
535
3
0
29 Apr 2025
X-Fusion: Introducing New Modality to Frozen Large Language Models
Sicheng Mo
Thao Nguyen
Xun Huang
Siddharth Srinivasan Iyer
Yijun Li
...
Eli Shechtman
Krishna Kumar Singh
Yong Jae Lee
Bolei Zhou
Yuheng Li
380
8
0
29 Apr 2025
SynergyAmodal: Deocclude Anything with Text Control
Xinyang Li
Chengjie Yi
Jiawei Lai
Mingbao Lin
Yansong Qu
Shengchuan Zhang
Liujuan Cao
DiffM
283
3
0
28 Apr 2025
IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos
Computer Vision and Pattern Recognition (CVPR), 2025
Yuan Li
Ziqian Bai
Feitong Tan
Zhaopeng Cui
S. Fanello
Yinda Zhang
DiffM
VGen
316
1
0
27 Apr 2025
Learning to Drive from a World Model
Mitchell Goff
Greg Hogan
George Hotz
Armand du Parc Locmaria
Kacper Raczy
Harald Schäfer
Adeeb Shihadeh
Weixing Zhang
Yassine Yousfi
216
5
0
27 Apr 2025
REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models
Gal Almog
Ariel Shamir
Ohad Fried
DiffM
247
1
0
26 Apr 2025
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Xu Ma
Peize Sun
Haoyu Ma
Hao Tang
Chih-Yao Ma
...
Matt Feiszli
Peizhao Zhang
Peter Vajda
Sam S. Tsai
Y. Fu
507
12
0
24 Apr 2025
Step1X-Edit: A Practical Framework for General Image Editing
Shixuan Liu
Yucheng Han
Peng Xing
Fukun Yin
Rui Wang
...
Yibo Zhu
Binxing Jiao
Wei Wei
Gang Yu
Daxin Jiang
DiffM
762
172
0
24 Apr 2025
DreamO: A Unified Framework for Image Customization
Chong Mou
Yanze Wu
Wenxu Wu
Zinan Guo
Pengze Zhang
...
Shaojin Wu
Songtao Zhao
Jian Zhang
Qian He
Xinglong Wu
585
48
0
23 Apr 2025
DiTPainter: Efficient Video Inpainting with Diffusion Transformers
Xian Wu
Chang Liu
DiffM
365
2
0
22 Apr 2025
FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation
Zebin Yao
Lujie Niu
Huixing Jiang
Chen Wei
Fangkun Zhao
Ruifan Li
Fangxiang Feng
DiffM
496
1
0
22 Apr 2025
DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation
Weijie He
Mushui Liu
YunLong Yu
Zhao Wang
Chao Wu
DiffM
VGen
369
1
0
21 Apr 2025
Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration
Computer Vision and Pattern Recognition (CVPR), 2025
Junyuan Deng
Xinyi Wu
Yongxing Yang
Congchao Zhu
Song Wang
Zhenyao Wu
319
3
0
21 Apr 2025
"I Know It When I See It": Mood Spaces for Connecting and Expressing Visual Concepts
Huzheng Yang
Katherine Xu
Michael D. Grossberg
Yutong Bai
Jianbo Shi
257
0
0
21 Apr 2025
OmniAudio: Generating Spatial Audio from 360-Degree Video
Huadai Liu
Tianyi Luo
Qikai Jiang
Kaicheng Luo
Peiwen Sun
...
Xin Li
Shiliang Zhang
Zhijie Yan
Zhou Zhao
Wei Xue
VGen
472
11
0
21 Apr 2025
Previous
1
2
3
...
15
16
17
...
23
24
25
Next
Page 16 of 25
Page
of 25
Go