Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.10789
Cited By
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
22 June 2022
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
Zirui Wang
Vijay Vasudevan
Alexander Ku
Yinfei Yang
Burcu Karagol Ayan
Ben Hutchinson
Wei Han
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"
50 / 864 papers shown
Title
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Xuying Zhang
Yutong Liu
Yangguang Li
Renrui Zhang
Y. Liu
...
Wanli Ouyang
Zhiwei Xiong
Peng Gao
Qibin Hou
Ming-Ming Cheng
115
3
0
13 Mar 2025
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models
Jialv Zou
Bencheng Liao
Qian Zhang
Wenyu Liu
Xinggang Wang
Mamba
MLLM
77
1
0
11 Mar 2025
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
Lixue Gong
Xiaoxia Hou
Fanshi Li
Liang Li
Xiaochen Lian
...
Qi Zhang
Yuwei Zhang
Shijia Zhao
Jianchao Yang
Weilin Huang
DiffM
VLM
55
5
0
10 Mar 2025
Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment
Xing Xie
Jiawei Liu
Ziyue Lin
Huijie Fan
Zhi-Long Han
Yandong Tang
Liangqiong Qu
40
0
0
10 Mar 2025
LatexBlend: Scaling Multi-concept Customized Generation with Latent Textual Blending
Jian Jin
Zhenbo Yu
Yang Shen
Zhenyong Fu
Jian Yang
DiffM
60
0
0
10 Mar 2025
Fine-Grained Alignment and Noise Refinement for Compositional Text-to-Image Generation
Amir Mohammad Izadi
Seyed Mohsen Hosseini
Soroush Vafaie Tabar
Ali Abdollahi
Armin Saghafian
M. Baghshah
EGVM
40
0
0
09 Mar 2025
ROCM: RLHF on consistency models
Shivanshu Shekhar
Tong Zhang
38
0
0
08 Mar 2025
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
Jian Ma
Qirong Peng
Xu Guo
Chen Chen
H. Lu
Zhenyu Yang
VLM
64
1
0
08 Mar 2025
Unified Reward Model for Multimodal Understanding and Generation
Yibin Wang
Yuhang Zang
Hao Li
Cheng Jin
J. Wang
EGVM
54
4
0
07 Mar 2025
Frequency Autoregressive Image Generation with Continuous Tokens
Hu Yu
Hao Luo
Hangjie Yuan
Yu Rong
Feng Zhao
VGen
37
1
0
07 Mar 2025
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Haoxin Li
Boyang Li
CoGe
67
0
0
03 Mar 2025
CacheQuant: Comprehensively Accelerated Diffusion Models
Xuewen Liu
Zhikai Li
Qingyi Gu
DiffM
30
0
0
03 Mar 2025
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
Ziyang Zhang
Yang Yu
Yucheng Chen
Xulei Yang
S. Yeo
MedIm
45
1
0
02 Mar 2025
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Y. Hu
Zining Liu
Zhenyuan Dong
Tianfan Peng
Bradley McDanel
S. Zhang
85
0
0
27 Feb 2025
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Sucheng Ren
Qihang Yu
Ju He
Xiaohui Shen
Alan Yuille
Liang-Chieh Chen
VGen
76
6
0
27 Feb 2025
Unified Prompt Attack Against Text-to-Image Generation Models
Duo Peng
Qiuhong Ke
Mark He Huang
Ping Hu
J. Liu
41
0
0
23 Feb 2025
Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
Sheng-Yu Wang
Aaron Hertzmann
Alexei A. Efros
Jun-Yan Zhu
Richard Zhang
TDI
120
2
0
21 Feb 2025
Accelerating Diffusion Transformers with Token-wise Feature Caching
Chang Zou
Xuyang Liu
Ting Liu
Siteng Huang
Linfeng Zhang
47
13
0
20 Feb 2025
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Theodoros Kouzelis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
DRL
67
5
0
17 Feb 2025
LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models
Sihwan Park
Doohyuk Jang
Sungyub Kim
Souvik Kundu
Eunho Yang
62
0
0
10 Feb 2025
UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Weijia Mao
Z. Yang
Mike Zheng Shou
MoE
63
0
0
10 Feb 2025
FairT2I: Mitigating Social Bias in Text-to-Image Generation via Large Language Model-Assisted Detection and Attribute Rebalancing
Jinya Sakurai
Issei Sato
74
0
0
06 Feb 2025
HuViDPO:Enhancing Video Generation through Direct Preference Optimization for Human-Centric Alignment
Lifan Jiang
Boxi Wu
Jiahui Zhang
Xiaotong Guan
Shuang Chen
VGen
61
1
0
02 Feb 2025
CAT Pruning: Cluster-Aware Token Pruning For Text-to-Image Diffusion Models
Xinle Cheng
Zhuoming Chen
Zhihao Jia
DiffM
VLM
41
1
0
01 Feb 2025
PreciseCam: Precise Camera Control for Text-to-Image Generation
Edurne Bernal-Berdun
Ana Serrano
B. Masiá
Matheus Gadelha
Yannick Hold-Geoffroy
Xin Sun
Diego F. F. Gutierrez
DiffM
VGen
45
0
0
22 Jan 2025
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
MLLM
VLM
LRM
106
4
0
21 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
95
16
0
17 Jan 2025
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Dongwon Kim
Ju He
Qihang Yu
Chenglin Yang
Xiaohui Shen
Suha Kwak
Liang-Chieh Chen
VLM
43
6
0
13 Jan 2025
Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation
Xiaoying Xing
Avinab Saha
Junfeng He
Susan Hao
Paul Vicol
...
Sahil Singla
Sarah Young
Yinxiao Li
Feng Yang
Deepak Ramachandran
DiffM
48
0
0
11 Jan 2025
INFELM: In-depth Fairness Evaluation of Large Text-To-Image Models
Di Jin
Xing Liu
Yu Liu
Jia Qing Yap
Andrea Wong
Adriana Crespo
Qi Lin
Zhiyuan Yin
Qiang Yan
Ryan Ye
EGVM
VLM
59
0
0
10 Jan 2025
EditAR: Unified Conditional Generation with Autoregressive Models
Jiteng Mu
Nuno Vasconcelos
X. Wang
DiffM
38
3
0
08 Jan 2025
Learning the Language of Protein Structure
Benoit Gaujac
Jérémie Donà
Liviu Copoiu
Timothy Atkinson
Thomas Pierrot
Thomas D. Barrett
51
10
0
08 Jan 2025
Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models
Yuzhu Cai
Sheng Yin
Yuxi Wei
Chenxin Xu
Weibo Mao
Felix Juefei Xu
Siheng Chen
Yanfeng Wang
EGVM
76
2
0
03 Jan 2025
TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions
Vriksha Srihari
R. Bhavya
Shruti Jayaraman
V. Mary Anita Rajam
DiffM
VGen
25
0
0
02 Jan 2025
Grid Diffusion Models for Text-to-Video Generation
Taegyeong Lee
Soyeong Kwon
Taehwan Kim
41
5
0
31 Dec 2024
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers
Yuntao Chen
Yuqi Wang
Zhaoxiang Zhang
68
7
0
24 Dec 2024
When Worse is Better: Navigating the compression-generation tradeoff in visual tokenization
Vivek Ramanujan
Kushal Tirumala
Armen Aghajanyan
Luke Zettlemoyer
Ali Farhadi
DiffM
74
2
0
20 Dec 2024
Parallelized Autoregressive Visual Generation
Y. Wang
Shuhuai Ren
Zhijie Lin
Yujin Han
Haoyuan Guo
Zhenheng Yang
Difan Zou
Jiashi Feng
Xihui Liu
VGen
84
11
0
19 Dec 2024
Next Patch Prediction for Autoregressive Visual Generation
Yatian Pang
Peng Jin
Shuo Yang
Bin Lin
Bin Zhu
...
Liuhan Chen
Francis E. H. Tay
Ser-Nam Lim
Harry Yang
Li Yuan
120
8
0
19 Dec 2024
Dialogue with the Machine and Dialogue with the Art World: Evaluating Generative AI for Culturally-Situated Creativity
Rida Qadri
Piotr Mirowski
Aroussiak Gabriellan
Farbod Mehr
Huma Gupta
Pamela Karimi
Remi Denton
67
0
0
18 Dec 2024
Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing
Jiancheng Huang
Yi Huang
Jianzhuang Liu
Donghao Zhou
Y. Liu
Shifeng Chen
DiffM
77
0
0
15 Dec 2024
Mojito: Motion Trajectory and Intensity Control for Video Generation
Xuehai He
Shuohang Wang
Jianwei Yang
Xiaoxia Wu
Y. Wang
Kuan-Chieh Jackson Wang
Z. Zhan
Olatunji Ruwase
Yelong Shen
X. Wang
VGen
83
1
0
12 Dec 2024
Personalized and Sequential Text-to-Image Generation
Ofir Nabati
Guy Tennenholtz
ChihWei Hsu
Moonkyung Ryu
Deepak Ramachandran
Yinlam Chow
Xiang Li
Craig Boutilier
MLLM
70
0
0
10 Dec 2024
[MASK] is All You Need
Vincent Tao Hu
Bjorn Ommer
DiffM
135
2
0
09 Dec 2024
T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
Ziwei Huang
Wanggui He
Quanyu Long
Yandi Wang
Haoyuan Li
...
Fangxun Shu
Long Chen
Hao Jiang
Leilei Gan
Fei Wu
EGVM
109
3
0
05 Dec 2024
MFTF: Mask-free Training-free Object Level Layout Control Diffusion Model
Shan Yang
DiffM
71
0
0
02 Dec 2024
IQA-Adapter: Exploring Knowledge Transfer from Image Quality Assessment to Diffusion-based Generative Models
Khaled Abud
Sergey Lavrushkin
Alexey Kirillov
D. Vatolin
84
0
0
02 Dec 2024
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
Anton Voronov
Denis Kuznedelev
Mikhail Khoroshikh
Valentin Khrulkov
Dmitry Baranchuk
106
2
0
02 Dec 2024
DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling
Xin Xie
Dong Gong
82
1
0
01 Dec 2024
Continuous Concepts Removal in Text-to-image Diffusion Models
Tingxu Han
Weisong Sun
Yanrong Hu
Chunrong Fang
Yonglong Zhang
Shiqing Ma
Tao Zheng
Zhenyu Chen
Zhenting Wang
DiffM
103
2
0
30 Nov 2024
Previous
1
2
3
4
5
...
16
17
18
Next