ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.03206
  4. Cited By
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

5 March 2024
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
Harry Saini
Yam Levi
Dominik Lorenz
Axel Sauer
Frederic Boesel
Dustin Podell
Tim Dockhorn
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
    DiffM
ArXivPDFHTML

Papers citing "Scaling Rectified Flow Transformers for High-Resolution Image Synthesis"

50 / 796 papers shown
Title
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware
  Diffusion and Iterative Refinement
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement
Xu He
Xiaoyu Li
Di Kang
Jiangnan Ye
Chaopeng Zhang
Liyang Chen
Xiangjun Gao
Han Zhang
Zhiyong Wu
Haolin Zhuang
DiffM
25
7
0
26 Aug 2024
SurGen: Text-Guided Diffusion Model for Surgical Video Generation
SurGen: Text-Guided Diffusion Model for Surgical Video Generation
Joseph Cho
Samuel Schmidgall
C. Zakka
Mrudang Mathur
Dhamanpreet Kaur
R. Shad
W. Hiesinger
VGen
MedIm
29
6
0
26 Aug 2024
Latent Space Disentanglement in Diffusion Transformers Enables Zero-shot
  Fine-grained Semantic Editing
Latent Space Disentanglement in Diffusion Transformers Enables Zero-shot Fine-grained Semantic Editing
Zitao Shuai
Chenwei Wu
Zhengxu Tang
Bowen Song
Liyue Shen
33
0
0
23 Aug 2024
What Do You Want? User-centric Prompt Generation for Text-to-image
  Synthesis via Multi-turn Guidance
What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance
Yilun Liu
Minggui He
Feiyu Yao
Yuhe Ji
Shimin Tao
...
Jian Gao
Li Zhang
Hao Yang
Boxing Chen
Osamu Yoshie
39
4
0
23 Aug 2024
Show-o: One Single Transformer to Unify Multimodal Understanding and
  Generation
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Jinheng Xie
Weijia Mao
Zechen Bai
David Junhao Zhang
Weihao Wang
Kevin Qinghong Lin
Yuchao Gu
Zhijie Chen
Zhenheng Yang
Mike Zheng Shou
44
159
0
22 Aug 2024
MeTTA: Single-View to 3D Textured Mesh Reconstruction with Test-Time
  Adaptation
MeTTA: Single-View to 3D Textured Mesh Reconstruction with Test-Time Adaptation
Kim Yu-Ji
Hyunwoo Ha
Kim Youwang
Jaeheung Surh
Hyowon Ha
Tae-Hyun Oh
22
0
0
21 Aug 2024
Transfusion: Predict the Next Token and Diffuse Images with One
  Multi-Modal Model
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Chunting Zhou
Lili Yu
Arun Babu
Kushal Tirumala
Michihiro Yasunaga
Leonid Shamis
Jacob Kahn
Xuezhe Ma
Luke Zettlemoyer
Omer Levy
DiffM
23
145
0
20 Aug 2024
MegaFusion: Extend Diffusion Models towards Higher-resolution Image
  Generation without Further Tuning
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
Haoning Wu
Shaocheng Shen
Qiang Hu
Xiaoyun Zhang
Ya Zhang
Yanfeng Wang
19
9
0
20 Aug 2024
MUSES: 3D-Controllable Image Generation via Multi-Modal Agent
  Collaboration
MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
Yanbo Ding
Shaobin Zhuang
Kunchang Li
Zhengrong Yue
Yu Qiao
Yali Wang
VGen
22
0
0
20 Aug 2024
Factorized-Dreamer: Training A High-Quality Video Generator with Limited
  and Low-Quality Data
Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data
Tao Yang
Yangming Shi
Yunwen Huang
Feng Chen
Yin Zheng
Lei Zhang
DiffM
VGen
59
0
0
19 Aug 2024
Detecting the Undetectable: Combining Kolmogorov-Arnold Networks and MLP
  for AI-Generated Image Detection
Detecting the Undetectable: Combining Kolmogorov-Arnold Networks and MLP for AI-Generated Image Detection
Taharim Rahman Anon
Jakaria Islam Emon
27
3
0
18 Aug 2024
Are CLIP features all you need for Universal Synthetic Image Origin
  Attribution?
Are CLIP features all you need for Universal Synthetic Image Origin Attribution?
Dario Cioni
Christos Tzelepis
Lorenzo Seidenari
Ioannis Patras
35
2
0
17 Aug 2024
An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation
An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation
Peiming Guo
Sinuo Liu
Yanzhao Zhang
Dingkun Long
Pengjun Xie
Meishan Zhang
M. Zhang
DiffM
45
1
0
16 Aug 2024
MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and
  3D Editing
MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing
Chenjie Cao
Chaohui Yu
Yanwei Fu
Fan Wang
Xiangyang Xue
VGen
45
7
0
15 Aug 2024
Hybrid SD: Edge-Cloud Collaborative Inference for Stable Diffusion
  Models
Hybrid SD: Edge-Cloud Collaborative Inference for Stable Diffusion Models
Chenqian Yan
Songwei Liu
Hongjian Liu
Xurui Peng
Xiaojian Wang
Fangming Chen
Lean Fu
Xing Mei
18
6
0
13 Aug 2024
Music2Latent: Consistency Autoencoders for Latent Audio Compression
Music2Latent: Consistency Autoencoders for Latent Audio Compression
Marco Pasini
Stefan Lattner
George Fazekas
14
6
0
12 Aug 2024
UniPortrait: A Unified Framework for Identity-Preserving Single- and
  Multi-Human Image Personalization
UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization
Junjie He
Yifeng Geng
Liefeng Bo
DiffM
44
20
0
12 Aug 2024
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang
Jiayan Teng
Wendi Zheng
Ming Ding
Shiyu Huang
...
Weihan Wang
Yean Cheng
Xiaotao Gu
Yuxiao Dong
Jie Tang
DiffM
VGen
72
384
0
12 Aug 2024
Efficient Diffusion Transformer with Step-wise Dynamic Attention
  Mediators
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
Yifan Pu
Zhuofan Xia
Jiayi Guo
Dongchen Han
Qixiu Li
...
Ji Li
Yizeng Han
Shiji Song
Gao Huang
Xiu Li
53
11
0
11 Aug 2024
FlowDreamer: exploring high fidelity text-to-3D generation via rectified
  flow
FlowDreamer: exploring high fidelity text-to-3D generation via rectified flow
Hangyu Li
Xiangxiang Chu
Dingyuan Shi
Lin Wang
45
3
0
09 Aug 2024
IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning
  using Instruct Prompts
IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts
Ciara Rowles
Shimon Vainer
Dante De Nigris
Slava Elizarov
Konstantin Kutsy
Simon Donné
DiffM
29
9
0
06 Aug 2024
Body of Her: A Preliminary Study on End-to-End Humanoid Agent
Body of Her: A Preliminary Study on End-to-End Humanoid Agent
Tenglong Ao
LM&Ro
18
1
0
06 Aug 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Dongyang Liu
Shitian Zhao
Le Zhuo
Weifeng Lin
Yu Qiao
Xinyue Li
Qi Qin
Yu Qiao
Hongsheng Li
Peng Gao
MLLM
60
48
0
05 Aug 2024
VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling
VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling
Qian Zhang
Xiangzi Dai
Ninghua Yang
Xiang An
Ziyong Feng
Xingyu Ren
VLM
CLIP
35
17
0
02 Aug 2024
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like
  Spontaneous Representation
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Xinhan Di
Jiahao Lu
Yunming Liang
Junjie Zheng
Yihua Wang
Chaofan Ding
ALM
31
1
0
01 Aug 2024
Hierarchical Conditioning of Diffusion Models Using Tree-of-Life for
  Studying Species Evolution
Hierarchical Conditioning of Diffusion Models Using Tree-of-Life for Studying Species Evolution
Yuanqing Wang
Arka Daw
M. Maruf
Josef C. Uyeda
Wasila Dahdul
...
James P. Balhoff
Kyunghyun Cho
Charles V. Stewart
Tanya Berger-Wolf
Anuj Karpatne
AI4CE
19
1
0
31 Jul 2024
VolDoGer: LLM-assisted Datasets for Domain Generalization in
  Vision-Language Tasks
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks
Juhwan Choi
Junehyoung Kwon
Jungmin Yun
Seunguk Yu
Youngbin Kim
36
1
0
29 Jul 2024
RNACG: A Universal RNA Sequence Conditional Generation model based on Flow-Matching
RNACG: A Universal RNA Sequence Conditional Generation model based on Flow-Matching
Letian Gao
Zhi John Lu
22
0
0
29 Jul 2024
MemBench: Memorized Image Trigger Prompt Dataset for Diffusion Models
MemBench: Memorized Image Trigger Prompt Dataset for Diffusion Models
Chunsan Hong
Tae-Hyun Oh
Minhyuk Sung
VLM
EGVM
24
0
0
24 Jul 2024
Stretching Each Dollar: Diffusion Training from Scratch on a
  Micro-Budget
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag
Xianghao Kong
Jingtao Li
Michael Spranger
Lingjuan Lyu
DiffM
32
8
0
22 Jul 2024
DriveDiTFit: Fine-tuning Diffusion Transformers for Autonomous Driving
DriveDiTFit: Fine-tuning Diffusion Transformers for Autonomous Driving
Jiahang Tu
Wei Ji
Han Zhao
Chao Zhang
Roger Zimmermann
Hui Qian
28
5
0
22 Jul 2024
Discrete Flow Matching
Discrete Flow Matching
Itai Gat
Tal Remez
Neta Shaul
Felix Kreuk
Ricky T. Q. Chen
Gabriel Synnaeve
Yossi Adi
Y. Lipman
DiffM
29
56
0
22 Jul 2024
Stable Audio Open
Stable Audio Open
Zach Evans
Julian Parker
CJ Carr
Zack Zukowski
Josiah Taylor
Jordi Pons
61
36
0
19 Jul 2024
I2AM: Interpreting Image-to-Image Latent Diffusion Models via Bi-Attribution Maps
I2AM: Interpreting Image-to-Image Latent Diffusion Models via Bi-Attribution Maps
Junseo Park
Hyeryung Jang
70
0
0
17 Jul 2024
Scaling Diffusion Transformers to 16 Billion Parameters
Scaling Diffusion Transformers to 16 Billion Parameters
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Debang Li
Junshi Huang
DiffM
MoE
54
15
0
16 Jul 2024
DiNO-Diffusion. Scaling Medical Diffusion via Self-Supervised
  Pre-Training
DiNO-Diffusion. Scaling Medical Diffusion via Self-Supervised Pre-Training
Guillermo Jiménez-Pérez
Pedro Osório
Josef Cersovsky
Javier Montalt-Tordera
Jens Hooge
Steffen Vogler
Sadegh Mohammadi
MedIm
32
2
0
16 Jul 2024
Exploring the Potentials and Challenges of Deep Generative Models in
  Product Design Conception
Exploring the Potentials and Challenges of Deep Generative Models in Product Design Conception
Phillip Mueller
Lars Mikelsons
AI4CE
30
1
0
15 Jul 2024
Several questions of visual generation in 2024
Several questions of visual generation in 2024
Shuyang Gu
22
1
0
11 Jul 2024
Generative Image as Action Models
Generative Image as Action Models
Mohit Shridhar
Yat Long Lo
Stephen James
38
6
0
10 Jul 2024
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image
  Synthesis
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Wanggui He
Siming Fu
Mushui Liu
Xierui Wang
Wenyi Xiao
...
Zhelun Yu
Haoyuan Li
Ziwei Huang
Leilei Gan
Hao Jiang
DiffM
24
23
0
10 Jul 2024
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning
  Instruction Using Language Model
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Wenqi Zhang
Zhenglin Cheng
Yuanyu He
Mengna Wang
Yongliang Shen
...
Guiyang Hou
Mingqian He
Yanna Ma
Weiming Lu
Yueting Zhuang
SyDa
59
9
0
09 Jul 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
58
4
0
09 Jul 2024
MiraData: A Large-Scale Video Dataset with Long Durations and Structured
  Captions
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions
Xuan Ju
Yiming Gao
Zhaoyang Zhang
Ziyang Yuan
Xintao Wang
Ailing Zeng
Yu Xiong
Qiang Xu
Ying Shan
VGen
61
36
0
08 Jul 2024
LaSe-E2V: Towards Language-guided Semantic-Aware Event-to-Video
  Reconstruction
LaSe-E2V: Towards Language-guided Semantic-Aware Event-to-Video Reconstruction
Kanghao Chen
Hangyu Li
Jiazhou Zhou
Zeyu Wang
Lin Wang
DiffM
VGen
36
1
0
08 Jul 2024
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
Haozhe Zhao
Xiaojian Ma
Liang Chen
Shuzheng Si
Rujie Wu
Kaikai An
Peiyu Yu
Minjia Zhang
Qing Li
Baobao Chang
29
41
0
07 Jul 2024
Replication in Visual Diffusion Models: A Survey and Outlook
Replication in Visual Diffusion Models: A Survey and Outlook
Wenhao Wang
Yifan Sun
Zongxin Yang
Zhengdong Hu
Zhentao Tan
Yi Yang
66
6
0
07 Jul 2024
Improved Noise Schedule for Diffusion Training
Improved Noise Schedule for Diffusion Training
Tiankai Hang
Shuyang Gu
DiffM
16
10
0
03 Jul 2024
Consistency Flow Matching: Defining Straight Flows with Velocity
  Consistency
Consistency Flow Matching: Defining Straight Flows with Velocity Consistency
Ling Yang
Zixiang Zhang
Zhilong Zhang
Xingchao Liu
Minkai Xu
Wentao Zhang
Chenlin Meng
Stefano Ermon
Bin Cui
36
18
0
02 Jul 2024
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
Jian Ma
Yonglin Deng
Chen Chen
H. Lu
Zhenyu Yang
Zhenyu Yang
VLM
DiffM
82
6
0
02 Jul 2024
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
Kepan Nan
Rui Xie
Penghao Zhou
Tiehan Fan
Zhenheng Yang
Zhijie Chen
Xiang Li
Jian Yang
Ying Tai
73
68
0
02 Jul 2024
Previous
123...13141516
Next