Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2206.10789
Cited By
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
22 June 2022
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
Zirui Wang
Vijay Vasudevan
Alexander Ku
Yinfei Yang
Burcu Karagol Ayan
Ben Hutchinson
Wei Han
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (4 upvotes)
Papers citing
"Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"
50 / 1,010 papers shown
Unified Reward Model for Multimodal Understanding and Generation
Yibin Wang
Yuhang Zang
Hao Li
Cheng Jin
Jiadong Wang
EGVM
395
78
0
07 Mar 2025
CacheQuant: Comprehensively Accelerated Diffusion Models
Computer Vision and Pattern Recognition (CVPR), 2025
Xuewen Liu
Zhikai Li
Qingyi Gu
DiffM
202
6
0
03 Mar 2025
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Computer Vision and Pattern Recognition (CVPR), 2025
Haoxin Li
Boyang Li
CoGe
692
4
0
03 Mar 2025
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
Computer Vision and Pattern Recognition (CVPR), 2025
Ziyang Zhang
Yang Yu
Yucheng Chen
Xulei Yang
S. Yeo
MedIm
541
9
0
02 Mar 2025
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Y. Hu
Zining Liu
Zhenyuan Dong
Tianfan Peng
Bradley McDanel
Shanghang Zhang
742
0
0
27 Feb 2025
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Sucheng Ren
Qihang Yu
Ju He
Xiaohui Shen
Alan Yuille
Liang-Chieh Chen
VGen
537
46
0
27 Feb 2025
Multi-Dimensional Quality Assessment for Text-to-3D Assets: Dataset and Model
IEEE transactions on multimedia (TMM), 2025
Kang Fu
Huiyu Duan
Zicheng Zhang
Xiaohong Liu
Xiongkuo Min
Jia Wang
Guoquan Zheng
EGVM
151
4
0
24 Feb 2025
Unified Prompt Attack Against Text-to-Image Generation Models
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Duo Peng
Qiuhong Ke
Mark He Huang
Ping Hu
Jing Liu
262
4
0
23 Feb 2025
Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
Neural Information Processing Systems (NeurIPS), 2024
Sheng-Yu Wang
Aaron Hertzmann
Alexei A. Efros
Jun-Yan Zhu
Richard Zhang
TDI
461
16
0
21 Feb 2025
Accelerating Diffusion Transformers with Token-wise Feature Caching
International Conference on Learning Representations (ICLR), 2024
Chang Zou
Xuyang Liu
Ting Liu
Siteng Huang
Linfeng Zhang
423
61
0
20 Feb 2025
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Theodoros Kouzelis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
DRL
690
43
0
13 Feb 2025
UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Weijia Mao
Zhiyong Yang
Mike Zheng Shou
MoE
702
2
0
10 Feb 2025
LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models
Sihwan Park
Doohyuk Jang
Sungyub Kim
Souvik Kundu
Eunho Yang
351
7
0
10 Feb 2025
FairT2I: Mitigating Social Bias in Text-to-Image Generation via Large Language Model-Assisted Detection and Attribute Rebalancing
Jinya Sakurai
Issei Sato
Issei Sato
509
3
0
06 Feb 2025
HuViDPO:Enhancing Video Generation through Direct Preference Optimization for Human-Centric Alignment
Lifan Jiang
Boxi Wu
Jiahui Zhang
Xiaotong Guan
Shuang Chen
VGen
256
7
0
02 Feb 2025
CAT Pruning: Cluster-Aware Token Pruning For Text-to-Image Diffusion Models
Xinle Cheng
Zhuoming Chen
Zhihao Jia
DiffM
VLM
200
8
0
01 Feb 2025
PreciseCam: Precise Camera Control for Text-to-Image Generation
Computer Vision and Pattern Recognition (CVPR), 2025
Edurne Bernal-Berdun
Ana Serrano
B. Masiá
Matheus Gadelha
Yannick Hold-Geoffroy
Xin Sun
Diego F. F. Gutierrez
DiffM
VGen
216
9
0
22 Jan 2025
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
MLLM
VLM
LRM
338
29
0
21 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
IEEE Reviews in Biomedical Engineering (RBME), 2024
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
771
71
0
17 Jan 2025
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Dongwon Kim
Ju He
Qihang Yu
Chenglin Yang
Xiaohui Shen
Suha Kwak
Liang-Chieh Chen
VLM
433
25
0
13 Jan 2025
Personalized Preference Fine-tuning of Diffusion Models
Computer Vision and Pattern Recognition (CVPR), 2025
Meihua Dang
Anikait Singh
Linqi Zhou
Stefano Ermon
Jiaming Song
132
13
0
11 Jan 2025
Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation
Computer Vision and Pattern Recognition (CVPR), 2025
Xiaoying Xing
Avinab Saha
Junfeng He
Susan Hao
Paul Vicol
...
Sahil Singla
Sarah Young
Yinxiao Li
Feng Yang
Deepak Ramachandran
DiffM
306
3
0
11 Jan 2025
INFELM: In-depth Fairness Evaluation of Large Text-To-Image Models
Di Jin
Xing Liu
Yu Liu
Jia Qing Yap
Andrea Wong
Adriana Crespo
Qi Lin
Zhiyuan Yin
Qiang Yan
Ryan Ye
EGVM
VLM
1.1K
1
0
10 Jan 2025
EditAR: Unified Conditional Generation with Autoregressive Models
Computer Vision and Pattern Recognition (CVPR), 2025
Jiteng Mu
Nuno Vasconcelos
Xinyu Wang
DiffM
253
23
0
08 Jan 2025
Learning the Language of Protein Structure
Benoit Gaujac
Jérémie Donà
Liviu Copoiu
Timothy Atkinson
Thomas Pierrot
Thomas D. Barrett
272
14
0
08 Jan 2025
Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models
Patterns (Patterns), 2024
Yuzhu Cai
Sheng Yin
Yuxi Wei
Chenxin Xu
Weibo Mao
Felix Juefei Xu
Siheng Chen
Yanfeng Wang
EGVM
451
4
0
03 Jan 2025
TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions
Vriksha Srihari
R. Bhavya
Shruti Jayaraman
V. Mary Anita Rajam
DiffM
VGen
325
0
0
02 Jan 2025
Grid Diffusion Models for Text-to-Video Generation
Computer Vision and Pattern Recognition (CVPR), 2024
Taegyeong Lee
Soyeong Kwon
Taehwan Kim
313
19
0
31 Dec 2024
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers
Yuntao Chen
Yuqi Wang
Rundong Wang
1.0K
44
0
24 Dec 2024
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Xuying Zhang
Yutong Liu
Yangguang Li
Renrui Zhang
Yong Liu
...
Wanli Ouyang
Zhiwei Xiong
Shiyang Feng
Qibin Hou
Ming-Ming Cheng
667
9
0
22 Dec 2024
When Worse is Better: Navigating the compression-generation tradeoff in visual tokenization
Vivek Ramanujan
Kushal Tirumala
Armen Aghajanyan
Luke Zettlemoyer
Ali Farhadi
DiffM
427
6
0
20 Dec 2024
Parallelized Autoregressive Visual Generation
Computer Vision and Pattern Recognition (CVPR), 2024
Yanjie Wang
Shuhuai Ren
Zhijie Lin
Yujin Han
Haoyuan Guo
Zhenheng Yang
Difan Zou
Jiashi Feng
Xihui Liu
VGen
649
37
0
19 Dec 2024
Next Patch Prediction for Autoregressive Visual Generation
Yatian Pang
Peng Jin
Shuo Yang
Bin Lin
Bin Zhu
...
Liuhan Chen
Francis E. H. Tay
Ser-Nam Lim
Harry Yang
Li Yuan
633
21
0
19 Dec 2024
Dialogue with the Machine and Dialogue with the Art World: Evaluating Generative AI for Culturally-Situated Creativity
Rida Qadri
Piotr Mirowski
Aroussiak Gabriellan
Farbod Mehr
Huma Gupta
Pamela Karimi
Remi Denton
225
2
0
18 Dec 2024
Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Jiancheng Huang
Yi Huang
Jianzhuang Liu
Donghao Zhou
Wenshu Fan
Shifeng Chen
DiffM
317
9
0
15 Dec 2024
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
Computer Vision and Pattern Recognition (CVPR), 2024
Dongting Hu
Jierun Chen
Xijie Huang
Huseyin Coskun
Arpit Sahni
...
Mingming Gong
Sergey Tulyakov
Vidit Goel
Yanwu Xu
Jian Ren
VLM
305
17
0
12 Dec 2024
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models
Computer Vision and Pattern Recognition (CVPR), 2024
Enis Simsar
Thomas Hofmann
F. Tombari
Pinar Yanardag
MoMe
364
7
0
12 Dec 2024
Mojito: Motion Trajectory and Intensity Control for Video Generation
Xuehai He
Shuohang Wang
Jianwei Yang
Xiaoxia Wu
Longji Xu
Kuan-Chieh Wang
Z. Zhan
Olatunji Ruwase
Yelong Shen
Xinze Wang
VGen
714
5
0
12 Dec 2024
[MASK] is All You Need
Vincent Tao Hu
Bjorn Ommer
DiffM
528
8
0
09 Dec 2024
Nested Diffusion Models Using Hierarchical Latent Priors
Computer Vision and Pattern Recognition (CVPR), 2024
Xiao Zhang
Ruoxi Jiang
Rebecca Willett
Michael Maire
BDL
DiffM
365
1
0
08 Dec 2024
T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Ziwei Huang
Wanggui He
Quanyu Long
Yandi Wang
Haoyuan Li
...
Fangxun Shu
Long Chen
Hao Jiang
Yaoyao Yu
Leilei Gan
EGVM
1.1K
9
0
05 Dec 2024
MFTF: Mask-free Training-free Object Level Layout Control Diffusion Model
Shan Yang
DiffM
213
0
0
02 Dec 2024
IQA-Adapter: Exploring Knowledge Transfer from Image Quality Assessment to Diffusion-based Generative Models
Khaled Abud
Sergey Lavrushkin
Alexey Kirillov
D. Vatolin
484
0
0
02 Dec 2024
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
Anton Voronov
Denis Kuznedelev
Mikhail Khoroshikh
Valentin Khrulkov
Dmitry Baranchuk
664
19
0
02 Dec 2024
DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling
Computer Vision and Pattern Recognition (CVPR), 2024
Xin Xie
Dong Gong
582
13
0
01 Dec 2024
Continuous Concepts Removal in Text-to-image Diffusion Models
Tingxu Han
Weisong Sun
Yanrong Hu
Chunrong Fang
Yonglong Zhang
Shiqing Ma
Tao Zheng
Zhenyu Chen
Zhenting Wang
DiffM
534
3
0
30 Nov 2024
DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Shwetha Ram
T. Neiman
Qianli Feng
Andrew Stuart
S. D. Tran
Trishul Chilimbi
315
5
0
28 Nov 2024
Self-Cross Diffusion Guidance for Text-to-Image Synthesis of Similar Subjects
Computer Vision and Pattern Recognition (CVPR), 2024
Weimin Qiu
Jieke Wang
Meng Tang
DiffM
427
8
0
28 Nov 2024
Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation
Tianyi Wei
Dongdong Chen
Yifan Zhou
Xingang Pan
EGVM
261
10
0
27 Nov 2024
ModeDreamer: Mode Guiding Score Distillation for Text-to-3D Generation using Reference Image Prompts
Uy Dieu Tran
Minh Luu
P. Nguyen
K. Nguyen
Binh-Son Hua
DiffM
454
1
0
27 Nov 2024
Previous
1
2
3
4
5
6
...
19
20
21
Next