ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.10789
  4. Cited By
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

22 June 2022
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
Zirui Wang
Vijay Vasudevan
Alexander Ku
Yinfei Yang
Burcu Karagol Ayan
Ben Hutchinson
Wei Han
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
    EGVM
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)

Papers citing "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"

50 / 1,010 papers shown
Did You Hear That? Introducing AADG: A Framework for Generating
  Benchmark Data in Audio Anomaly Detection
Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection
Ksheeraja Raghavan
Samiran Gode
Ankit Parag Shah
Surabhi Raghavan
Wolfram Burgard
Bhiksha Raj
Rita Singh
247
0
0
04 Oct 2024
LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding
LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative DecodingInternational Conference on Learning Representations (ICLR), 2024
Doohyuk Jang
Sihwan Park
J. Yang
Yeonsung Jung
Jihun Yun
Souvik Kundu
Sung-Yub Kim
Eunho Yang
463
29
0
04 Oct 2024
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample OptimizationInternational Conference on Learning Representations (ICLR), 2024
Zichen Miao
Zhengyuan Yang
Kevin Lin
Ze Wang
Zicheng Liu
Lijuan Wang
Qiang Qiu
400
14
0
04 Oct 2024
CaLMFlow: Volterra Flow Matching using Causal Language Models
CaLMFlow: Volterra Flow Matching using Causal Language Models
Shiyang Zhang
Daniel Levine
Ivan Vrkic
Marco Francesco Bressana
David Zhang
S. Rizvi
Yangtian Zhang
E. Zappala
David van Dijk
150
1
0
03 Oct 2024
ControlAR: Controllable Image Generation with Autoregressive Models
ControlAR: Controllable Image Generation with Autoregressive ModelsInternational Conference on Learning Representations (ICLR), 2024
Zongming Li
Tianheng Cheng
Shoufa Chen
Peize Sun
Haocheng Shen
Longjin Ran
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
DiffM
675
36
0
03 Oct 2024
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion ModelsInternational Conference on Learning Representations (ICLR), 2024
Seyedmorteza Sadat
Otmar Hilliges
Romann M. Weber
DiffM
402
49
0
03 Oct 2024
Loong: Generating Minute-level Long Videos with Autoregressive Language Models
Loong: Generating Minute-level Long Videos with Autoregressive Language Models
Yuqing Wang
Tianwei Xiong
Daquan Zhou
Zhijie Lin
Yang Zhao
Bingyi Kang
Jiashi Feng
Xihui Liu
VGen
375
69
0
03 Oct 2024
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive
  Transformer for Efficient Finegrained Image Generation
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image GenerationInternational Conference on Learning Representations (ICLR), 2024
Liang Chen
Sinan Tan
Zefan Cai
Weichu Xie
Haozhe Zhao
Yichi Zhang
Junyang Lin
Jinze Bai
Tianyu Liu
Baobao Chang
ViT
250
7
0
02 Oct 2024
Data Extrapolation for Text-to-image Generation on Small Datasets
Data Extrapolation for Text-to-image Generation on Small Datasets
Senmao Ye
Fei Liu
246
1
0
02 Oct 2024
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi DecodingInternational Conference on Learning Representations (ICLR), 2024
Yao Teng
Han Shi
Xian Liu
Xuefei Ning
Guohao Dai
Yu Wang
Zhenguo Li
Xihui Liu
380
42
0
02 Oct 2024
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice RoutingInternational Conference on Learning Representations (ICLR), 2024
Haotian Sun
Tao Lei
Bowen Zhang
Yanghao Li
Haoshuo Huang
Ruoming Pang
Bo Dai
Nan Du
DiffMMoE
703
17
0
02 Oct 2024
MCGM: Mask Conditional Text-to-Image Generative Model
MCGM: Mask Conditional Text-to-Image Generative Model
Rami Skaik
Leonardo Rossi
Tomaso Fontanini
Andrea Prati
DiffM
123
1
0
01 Oct 2024
CusConcept: Customized Visual Concept Decomposition with Diffusion
  Models
CusConcept: Customized Visual Concept Decomposition with Diffusion ModelsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Zhi Xu
Shaozhe Hao
Kai Han
DiffM
254
6
0
01 Oct 2024
Magnet: We Never Know How Text-to-Image Diffusion Models Work, Until We
  Learn How Vision-Language Models Function
Magnet: We Never Know How Text-to-Image Diffusion Models Work, Until We Learn How Vision-Language Models FunctionNeural Information Processing Systems (NeurIPS), 2024
Chenyi Zhuang
Ying Hu
Pan Gao
DiffMVLM
293
18
0
30 Sep 2024
MaskMamba: A Hybrid Mamba-Transformer Model for Masked Image Generation
MaskMamba: A Hybrid Mamba-Transformer Model for Masked Image Generation
Wenchao Chen
Liqiang Niu
Ziyao Lu
Fandong Meng
Jie Zhou
Mamba
289
9
0
30 Sep 2024
Emu3: Next-Token Prediction is All You Need
Emu3: Next-Token Prediction is All You Need
Xinlong Wang
Xiaosong Zhang
Zhengxiong Luo
Quan-Sen Sun
Yufeng Cui
...
Xi Yang
Jingjing Liu
Yonghua Lin
Tiejun Huang
Zhongyuan Wang
MLLM
290
483
0
27 Sep 2024
Trustworthy Text-to-Image Diffusion Models: A Timely and Focused Survey
Trustworthy Text-to-Image Diffusion Models: A Timely and Focused Survey
Yi Zhang
Zhen Chen
Chih-Hong Cheng
Wenjie Ruan
Xiaowei Huang
Dezong Zhao
David Flynn
Siddartha Khastgir
Xingyu Zhao
MedIm
464
6
0
26 Sep 2024
Flexiffusion: Segment-wise Neural Architecture Search for Flexible Denoising Schedule
Flexiffusion: Segment-wise Neural Architecture Search for Flexible Denoising Schedule
Hongtao Huang
Xiaojun Chang
Weitong Chen
317
0
0
26 Sep 2024
MonoFormer: One Transformer for Both Diffusion and Autoregression
MonoFormer: One Transformer for Both Diffusion and Autoregression
Chuyang Zhao
Yuxing Song
Wenhao Wang
Haocheng Feng
Errui Ding
Yifan Sun
Xinyan Xiao
Jingdong Wang
DiffM
234
39
0
24 Sep 2024
MaskBit: Embedding-free Image Generation via Bit Tokens
MaskBit: Embedding-free Image Generation via Bit Tokens
Mark Weber
Lijun Yu
Qihang Yu
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
DiffM
213
71
0
24 Sep 2024
TFG: Unified Training-Free Guidance for Diffusion Models
TFG: Unified Training-Free Guidance for Diffusion ModelsNeural Information Processing Systems (NeurIPS), 2024
Haotian Ye
Haowei Lin
Jiaqi Han
Minkai Xu
Sheng Liu
Yitao Liang
Jianzhu Ma
James Zou
Stefano Ermon
193
55
0
24 Sep 2024
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated ImagesNeural Information Processing Systems (NeurIPS), 2024
Zhecan Wang
Junzhang Liu
Chia-Wei Tang
Hani Alomari
Anushka Sivakumar
...
Haoxuan You
A. Ishmam
Kai-Wei Chang
Shih-Fu Chang
Chris Thomas
CoGeVLM
505
5
0
19 Sep 2024
Generalizing Alignment Paradigm of Text-to-Image Generation with
  Preferences through $f$-divergence Minimization
Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through fff-divergence MinimizationAAAI Conference on Artificial Intelligence (AAAI), 2024
Haoyuan Sun
Bo Xia
Yongzhe Chang
Xueqian Wang
EGVM
249
19
0
15 Sep 2024
TextureDiffusion: Target Prompt Disentangled Editing for Various Texture Transfer
TextureDiffusion: Target Prompt Disentangled Editing for Various Texture TransferIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Zihan Su
Junhao Zhuang
Chun Yuan
DiffM
369
0
0
15 Sep 2024
MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection
MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery DetectionIEEE Transactions on Information Forensics and Security (IEEE TIFS), 2024
Yaning Zhang
Tianyi Wang
Zitong Yu
Zan Gao
Linlin Shen
Shengyong Chen
DiffM
313
11
0
15 Sep 2024
G3PT: Unleash the power of Autoregressive Modeling in 3D Generation via
  Cross-scale Querying Transformer
G3PT: Unleash the power of Autoregressive Modeling in 3D Generation via Cross-scale Querying TransformerInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Jinzhi Zhang
Feng Xiong
Mu Xu
260
8
0
10 Sep 2024
SongCreator: Lyrics-based Universal Song Generation
SongCreator: Lyrics-based Universal Song GenerationNeural Information Processing Systems (NeurIPS), 2024
Shun Lei
Yixuan Zhou
Boshi Tang
Max W. Y. Lam
Feng Liu
Hangyu Liu
Jingcheng Wu
Shiyin Kang
Zhiyong Wu
Helen Meng
285
18
0
09 Sep 2024
Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors
Vec2Face: Scaling Face Dataset Generation with Loosely Constrained VectorsInternational Conference on Learning Representations (ICLR), 2024
Haiyu Wu
Jaskirat Singh
Sicong Tian
Liang Zheng
Kevin W. Bowyer
CVBM
609
12
0
04 Sep 2024
Accurate Compression of Text-to-Image Diffusion Models via Vector
  Quantization
Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization
Vage Egiazarian
Denis Kuznedelev
Anton Voronov
Ruslan Svirschevski
Michael Goin
Daniil Pavlov
Dan Alistarh
Dmitry Baranchuk
MQ
224
1
0
31 Aug 2024
AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
AdaNAT: Exploring Adaptive Policy for Token-Based Image GenerationEuropean Conference on Computer Vision (ECCV), 2024
Zanlin Ni
Yulin Wang
Renping Zhou
Rui Lu
Jiayi Guo
Jinyi Hu
Zhiyuan Liu
Yuan Yao
Gao Huang
349
15
0
31 Aug 2024
One-Shot Learning Meets Depth Diffusion in Multi-Object Videos
One-Shot Learning Meets Depth Diffusion in Multi-Object Videos
Anisha Jain
VGenDiffMMDE
138
1
0
29 Aug 2024
Are Pose Estimators Ready for the Open World? STAGE: Synthetic Data
  Generation Toolkit for Auditing 3D Human Pose Estimators
Are Pose Estimators Ready for the Open World? STAGE: Synthetic Data Generation Toolkit for Auditing 3D Human Pose Estimators
Nikita Kister
István Sárándi
Anna Khoreva
Gerard Pons-Moll
297
2
0
28 Aug 2024
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its
  Teacher
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its TeacherEuropean Conference on Computer Vision (ECCV), 2024
T. Dao
Thuan Hoang Nguyen
T. Le
D. Vu
Khoi Nguyen
Cuong Pham
Anh Tran
DiffM
318
33
0
26 Aug 2024
Transfusion: Predict the Next Token and Diffuse Images with One
  Multi-Modal Model
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Chunting Zhou
Lili Yu
Arun Babu
Kushal Tirumala
Michihiro Yasunaga
Leonid Shamis
Jacob Kahn
Xuezhe Ma
Luke Zettlemoyer
Omer Levy
DiffM
265
291
0
20 Aug 2024
Quality Assessment in the Era of Large Models: A Survey
Quality Assessment in the Era of Large Models: A Survey
Zicheng Zhang
Yingjie Zhou
Chunyi Li
Baixuan Zhao
Xiaohong Liu
Guangtao Zhai
344
33
0
17 Aug 2024
Can Large Language Models Understand Symbolic Graphics Programs?
Can Large Language Models Understand Symbolic Graphics Programs?International Conference on Learning Representations (ICLR), 2024
Zeju Qiu
Weiyang Liu
Haiwen Feng
Zhen Liu
Tim Z. Xiao
Katherine M. Collins
J. Tenenbaum
Adrian Weller
Michael J. Black
Bernhard Schölkopf
602
28
0
15 Aug 2024
One Framework to Rule Them All: Unifying Multimodal Tasks with LLM Neural-Tuning
One Framework to Rule Them All: Unifying Multimodal Tasks with LLM Neural-TuningPattern Recognition (Pattern Recogn.), 2024
Hao Sun
Yu Song
Jiaqing Liu
Jihong Hu
Yen-Wei Chen
Lanfen Lin
VLM
280
0
0
06 Aug 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Dongyang Liu
Shitian Zhao
Le Zhuo
Weifeng Lin
Ping Luo
Xinyue Li
Qi Qin
Yu Qiao
Hongsheng Li
Peng Gao
MLLM
418
111
0
05 Aug 2024
LEGO: Self-Supervised Representation Learning for Scene Text Images
LEGO: Self-Supervised Representation Learning for Scene Text Images
Yujin Ren
Jiaxin Zhang
Lianwen Jin
SSL
252
0
0
04 Aug 2024
Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion
Autonomous LLM-Enhanced Adversarial Attack for Text-to-MotionAAAI Conference on Artificial Intelligence (AAAI), 2024
Honglei Miao
Fan Ma
Ruijie Quan
Kun Zhan
Yi Yang
AAML
289
8
0
01 Aug 2024
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Yangzhen Wu
Neil G. Marchant
Shanda Li
Sean Welleck
Yiming Yang
138
0
0
01 Aug 2024
Fine-gained Zero-shot Video Sampling
Fine-gained Zero-shot Video Sampling
Dengsheng Chen
Jie Hu
Javier Segovia-Aguas
Enhua Wu
VGenDiffM
175
0
0
31 Jul 2024
MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented
  Generation via Knowledge-enhanced Reranking and Noise-injected Training
MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training
Rivik Setty
Chengjin Xu
Vinay Setty
Jian Guo
273
28
0
31 Jul 2024
Contrasting Deepfakes Diffusion via Contrastive Learning and
  Global-Local Similarities
Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local SimilaritiesEuropean Conference on Computer Vision (ECCV), 2024
Lorenzo Baraldi
Federico Cocchi
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
259
29
0
29 Jul 2024
Diffusion Models for Multi-Task Generative Modeling
Diffusion Models for Multi-Task Generative Modeling
Changyou Chen
Han Ding
Bunyamin Sisman
Yi Tian Xu
Ouye Xie
Benjamin Z. Yao
Son Dinh Tran
Belinda Zeng
DiffM
224
9
0
24 Jul 2024
Stretching Each Dollar: Diffusion Training from Scratch on a
  Micro-Budget
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag
Xianghao Kong
Jingtao Li
Michael Spranger
Lingjuan Lyu
DiffM
244
25
0
22 Jul 2024
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal
  Reasoning
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang
Garrett Bingham
Adams Wei Yu
Quoc V. Le
Thang Luong
Golnaz Ghiasi
MLLMLRM
373
28
0
22 Jul 2024
LSReGen: Large-Scale Regional Generator via Backward Guidance Framework
LSReGen: Large-Scale Regional Generator via Backward Guidance Framework
Bowen Zhang
Cheng Yang
Xuanhui Liu
DiffM
187
0
0
21 Jul 2024
Recent Advances in Generative AI and Large Language Models: Current
  Status, Challenges, and Perspectives
Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives
D. Hagos
Rick Battle
Danda B. Rawat
LM&MAOffRL
490
84
0
20 Jul 2024
Safe-SD: Safe and Traceable Stable Diffusion with Text Prompt Trigger
  for Invisible Generative Watermarking
Safe-SD: Safe and Traceable Stable Diffusion with Text Prompt Trigger for Invisible Generative Watermarking
Zhiyuan Ma
Guoli Jia
Biqing Qi
Bowen Zhou
WIGM
392
17
0
18 Jul 2024
Previous
123...678...192021
Next