ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.10789
  4. Cited By
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

22 June 2022
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
Zirui Wang
Vijay Vasudevan
Alexander Ku
Yinfei Yang
Burcu Karagol Ayan
Ben Hutchinson
Wei Han
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
    EGVM
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)

Papers citing "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"

50 / 1,010 papers shown
"I Know It When I See It": Mood Spaces for Connecting and Expressing Visual Concepts
"I Know It When I See It": Mood Spaces for Connecting and Expressing Visual Concepts
Huzheng Yang
Katherine Xu
Michael D. Grossberg
Yutong Bai
Jianbo Shi
257
0
0
21 Apr 2025
SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization
SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization
Liang Peng
Boxi Wu
Haoran Cheng
Yibo Zhao
Xiaofei He
197
0
0
20 Apr 2025
Personalized Text-to-Image Generation with Auto-Regressive Models
Personalized Text-to-Image Generation with Auto-Regressive Models
Kaiyue Sun
Xian Liu
Yao Teng
Xihui Liu
324
6
0
17 Apr 2025
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL
Junke Wang
Zhi Tian
Xinyu Wang
Xinyu Zhang
Weilin Huang
Zuxuan Wu
Yu Jiang
VGen
408
62
0
15 Apr 2025
Autoregressive Distillation of Diffusion Transformers
Autoregressive Distillation of Diffusion TransformersComputer Vision and Pattern Recognition (CVPR), 2025
Yeongmin Kim
Sotiris Anagnostidis
Yuming Du
Edgar Schönfeld
Jonas Kohler
Markos Georgopoulos
Albert Pumarola
Ali K. Thabet
A. Sanakoyeu
309
2
0
15 Apr 2025
InstructEngine: Instruction-driven Text-to-Image Alignment
InstructEngine: Instruction-driven Text-to-Image Alignment
Xingyu Lu
Yihan Hu
Yuanxing Zhang
Kaiyu Jiang
Changyi Liu
...
Bin Wen
C. Yuan
Fan Yang
Yan Li
Di Zhang
377
1
0
14 Apr 2025
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
Zhong-Yu Li
Ruoyi Du
Juncheng Yan
Le Zhuo
Zhen Li
Peng Gao
Zhanyu Ma
Ming-Ming Cheng
Ming-Ming Cheng
VLM
362
20
0
10 Apr 2025
GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography
GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography
Mengchen Zhang
Tong Wu
Jing Tan
Yu Qiao
Gordon Wetzstein
Dahua Lin
VGen
391
6
0
09 Apr 2025
Transfer between Modalities with MetaQueries
Transfer between Modalities with MetaQueries
Xichen Pan
Satya Narayan Shukla
Aashu Singh
Zhuokai Zhao
Shlok Kumar Mishra
...
Jiuhai Chen
Kunpeng Li
F. Xu
Ji Hou
Saining Xie
DiffM
291
119
0
08 Apr 2025
CDM-QTA: Quantized Training Acceleration for Efficient LoRA Fine-Tuning of Diffusion Model
CDM-QTA: Quantized Training Acceleration for Efficient LoRA Fine-Tuning of Diffusion ModelInternational Symposium on Circuits and Systems (ISCAS), 2025
Jinming Lu
Minghao She
Wendong Mao
Zhongfeng Wang
MQ
120
0
0
08 Apr 2025
Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation
Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation
Jiwoo Chung
Sangeek Hyun
Hyunjun Kim
Eunseo Koh
MinKyu Lee
Jae-Pil Heo
321
9
0
03 Apr 2025
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Dongchao Yang
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
341
15
0
03 Apr 2025
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Runhui Huang
Chunwei Wang
Junwei Yang
Guansong Lu
Yunlong Yuan
...
Lu Hou
Wei Zhang
Lanqing Hong
Hengshuang Zhao
Hang Xu
MLLM
366
31
0
02 Apr 2025
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation
Shaojin Wu
Mengqi Huang
Wenxu Wu
Yufeng Cheng
Fei Ding
Qian He
DiffM
349
81
0
02 Apr 2025
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models
Guy Kaplan
Michael Toker
Yuval Reif
Yonatan Belinkov
Roy Schwartz
DiffM
363
2
0
01 Apr 2025
Training-Free Text-Guided Image Editing with Visual Autoregressive Model
Training-Free Text-Guided Image Editing with Visual Autoregressive Model
Yufei Wang
Lanqing Guo
Zhihao Li
Jiaxing Huang
Pichao Wang
Bihan Wen
Jingchao Wang
DiffM
286
7
0
31 Mar 2025
Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synthesis
Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image SynthesisComputer Vision and Pattern Recognition (CVPR), 2025
Woojung Han
Yeonkyung Lee
Chanyoung Kim
Kwanghyun Park
Seong Jae Hwang
DiffM
276
5
0
28 Mar 2025
Evaluating Text-to-Image and Text-to-Video Synthesis with a Conditional Fréchet Distance
Evaluating Text-to-Image and Text-to-Video Synthesis with a Conditional Fréchet Distance
Jaywon Koo
J. Hernandez
Moayed Haji-Ali
Ziyan Yang
Vicente Ordonez
EGVM
323
0
0
27 Mar 2025
Contrastive Learning Guided Latent Diffusion Model for Image-to-Image Translation
Contrastive Learning Guided Latent Diffusion Model for Image-to-Image Translation
Qi Si
Bo Wang
Zhao Zhang
331
2
0
26 Mar 2025
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
Alex Jinpeng Wang
Linjie Li
Zhiyong Yang
Lijuan Wang
Min Li
DiffM
290
2
0
26 Mar 2025
ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
Jiaqi Liao
Zhiyong Yang
Linjie Li
Dianqi Li
Kevin Qinghong Lin
Yu Cheng
Lijuan Wang
MLLMLRM
262
21
0
25 Mar 2025
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model AlignmentComputer Vision and Pattern Recognition (CVPR), 2025
Yaojie Lu
Qichao Wang
H. Cao
Xierui Wang
Xiaoyin Xu
Min Zhang
329
9
0
24 Mar 2025
ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation
ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation
Oucheng Huang
Yuhang Ma
Zeng Zhao
Mingrui Wu
Jinfa Huang
Rongsheng Zhang
Zhibo Hu
Xiaoshuai Sun
Rongrong Ji
298
3
0
22 Mar 2025
TDRI: Two-Phase Dialogue Refinement and Co-Adaptation for Interactive Image Generation
TDRI: Two-Phase Dialogue Refinement and Co-Adaptation for Interactive Image Generation
Yuheng Feng
Jianhui Wang
Kun Li
Sida Li
Tianyu Shi
Haoyue Han
Miao Zhang
Xueqian Wang
DiffM
1.2K
0
0
22 Mar 2025
HyperLoRA: Parameter-Efficient Adaptive Generation for Portrait Synthesis
HyperLoRA: Parameter-Efficient Adaptive Generation for Portrait SynthesisComputer Vision and Pattern Recognition (CVPR), 2025
Mengtian Li
Jinshu Chen
Wanquan Feng
Bingchuan Li
Fei Dai
Mingcong Liu
Qian He
3DH
246
3
0
21 Mar 2025
Halton Scheduler For Masked Generative Image Transformer
Halton Scheduler For Masked Generative Image TransformerInternational Conference on Learning Representations (ICLR), 2025
Victor Besnier
Mickael Chen
David Hurych
Eduardo Valle
Matthieu Cord
266
21
0
21 Mar 2025
D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens
D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens
Panpan Wang
Liqiang Niu
Fandong Meng
Jinan Xu
Yufeng Chen
Jie Zhou
DiffM
312
0
0
21 Mar 2025
Zero-Shot Styled Text Image Generation, but Make It Autoregressive
Zero-Shot Styled Text Image Generation, but Make It AutoregressiveComputer Vision and Pattern Recognition (CVPR), 2025
Vittorio Pippi
Fabio Quattrini
S. Cascianelli
Alessio Tonioni
Rita Cucchiara
325
8
0
21 Mar 2025
Scale-wise Distillation of Diffusion Models
Scale-wise Distillation of Diffusion Models
Nikita Starodubcev
Denis Kuznedelev
Artem Babenko
Dmitry Baranchuk
DiffM
287
4
0
20 Mar 2025
ShapeShift: Towards Text-to-Shape Arrangement Synthesis with Content-Aware Geometric Constraints
ShapeShift: Towards Text-to-Shape Arrangement Synthesis with Content-Aware Geometric Constraints
Vihaan Misra
Peter Schaldenbrand
Jean Oh
DiffM
281
4
0
18 Mar 2025
Comp-Attn: Present-and-Align Attention for Compositional Video Generation
Comp-Attn: Present-and-Align Attention for Compositional Video Generation
Hongyu Zhang
Yufan Deng
Shenghai Yuan
Peng Jin
Zesen Cheng
Yian Zhao
Yu Xie
Jie Chen
DiffMVGen
616
0
0
18 Mar 2025
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
Tao Wang
Changxu Cheng
Lingfeng Wang
Senda Chen
Wuyue Zhao
VLM
342
8
0
17 Mar 2025
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
Dewei Zhou
Mingwei Li
Zongxin Yang
Yi Yang
489
15
0
17 Mar 2025
Next-Scale Autoregressive Models are Zero-Shot Single-Image Object View Synthesizers
Next-Scale Autoregressive Models are Zero-Shot Single-Image Object View Synthesizers
Shiran Yuan
Hao Zhao
DiffM
329
0
0
17 Mar 2025
The Amazon Nova Family of Models: Technical Report and Model Card
The Amazon Nova Family of Models: Technical Report and Model Card
Amazon AGI
Aaron Langford
A. Shah
Abhanshu Gupta
Abhimanyu Bhatter
...
Benjamin Biggs
Benjamin Ott
Bhanu Vinzamuri
Bharath Venkatesh
Bhavana Ganesh
278
48
0
17 Mar 2025
Unified Autoregressive Visual Generation and Understanding with Continuous Tokens
Unified Autoregressive Visual Generation and Understanding with Continuous Tokens
Lijie Fan
Luming Tang
Siyang Qin
Tianhong Li
Xuan S. Yang
...
Tao Zhu
Michael Rubinstein
Michalis Raptis
Deqing Sun
Radu Soricut
316
26
0
17 Mar 2025
FedGAI: Federated Style Learning with Cloud-Edge Collaboration for Generative AI in Fashion Design
FedGAI: Federated Style Learning with Cloud-Edge Collaboration for Generative AI in Fashion Design
Mingzhu Wu
Jianan Jiang
Xinglin Li
Hanhui Deng
Di Wu
FedML
376
1
0
16 Mar 2025
BalancedDPO: Adaptive Multi-Metric Alignment
BalancedDPO: Adaptive Multi-Metric Alignment
Dipesh Tamboli
Souradip Chakraborty
Aditya Malusare
B. Banerjee
Amrit Singh Bedi
Vaneet Aggarwal
EGVM
225
2
0
16 Mar 2025
LazyMAR: Accelerating Masked Autoregressive Models via Feature Caching
LazyMAR: Accelerating Masked Autoregressive Models via Feature Caching
Feihong Yan
Qingyan Wei
Jiayi Tang
Jiajun Li
Yidan Wang
Xuming Hu
Huiqi Li
Linfeng Zhang
238
6
0
16 Mar 2025
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Arsh Koneru
Yusuke Kato
Kazuki Kozuka
Aditya Grover
VLM
335
21
0
15 Mar 2025
Safe-VAR: Safe Visual Autoregressive Model for Text-to-Image Generative Watermarking
Ziyi Wang
Songbai Tan
Gang Xu
Xuerui Qiu
Hongbin Xu
Xin Meng
Ming Li
Fei Richard Yu
WIGM
368
0
0
14 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
526
12
0
13 Mar 2025
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models
Jialv Zou
Bencheng Liao
Qian Zhang
Wenyu Liu
Xinggang Wang
MambaMLLM
328
6
0
11 Mar 2025
Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment
Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment
Xing Xie
Jiawei Liu
Ziyue Lin
Huijie Fan
Zhi Han
Yandong Tang
Liangqiong Qu
428
0
0
10 Mar 2025
LatexBlend: Scaling Multi-concept Customized Generation with Latent Textual BlendingComputer Vision and Pattern Recognition (CVPR), 2025
Jian Jin
Zhenbo Yu
Yang Shen
Zhenyong Fu
Zhiqiang Wang
DiffM
310
5
0
10 Mar 2025
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
Lixue Gong
Xiaoxia Hou
Fanshi Li
Liang Li
Xiaochen Lian
...
Tao Gui
Yuwei Zhang
Shijia Zhao
Jianchao Yang
Weilin Huang
DiffMVLM
362
41
0
10 Mar 2025
Fine-Grained Alignment and Noise Refinement for Compositional Text-to-Image Generation
Fine-Grained Alignment and Noise Refinement for Compositional Text-to-Image Generation
Amir Mohammad Izadi
Seyed Mohammad Hadi Hosseini
Soroush Vafaie Tabar
Ali Abdollahi
Armin Saghafian
M. Baghshah
EGVM
318
3
0
09 Mar 2025
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
Jian Ma
Qirong Peng
Xu Guo
Chen Chen
H. Lu
Zhenyu Yang
VLM
564
5
0
08 Mar 2025
ROCM: RLHF on consistency models
Shivanshu Shekhar
Tong Zhang
203
1
0
08 Mar 2025
Frequency Autoregressive Image Generation with Continuous Tokens
Hu Yu
Hao Luo
Hangjie Yuan
Yu Rong
Feng Zhao
VGen
256
22
0
07 Mar 2025
Previous
12345...192021
Next