ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.06525
  4. Cited By
Autoregressive Model Beats Diffusion: Llama for Scalable Image
  Generation

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

10 June 2024
Peize Sun
Yi Jiang
Shoufa Chen
Shilong Zhang
Bingyue Peng
Ping Luo
Zehuan Yuan
    VLM
ArXivPDFHTML

Papers citing "Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation"

27 / 177 papers shown
Title
Emu3: Next-Token Prediction is All You Need
Emu3: Next-Token Prediction is All You Need
Xinlong Wang
Xiaosong Zhang
Zhengxiong Luo
Quan-Sen Sun
Yufeng Cui
...
Xi Yang
Jingjing Liu
Yonghua Lin
Tiejun Huang
Zhongyuan Wang
MLLM
28
147
0
27 Sep 2024
Pixel-Space Post-Training of Latent Diffusion Models
Pixel-Space Post-Training of Latent Diffusion Models
Christina Zhang
Simran Motwani
Matthew Yu
Ji Hou
Felix Juefei-Xu
Sam S. Tsai
Peter Vajda
Zijian He
Jialiang Wang
13
2
0
26 Sep 2024
MonoFormer: One Transformer for Both Diffusion and Autoregression
MonoFormer: One Transformer for Both Diffusion and Autoregression
Chuyang Zhao
Yuxing Song
Wenhao Wang
Haocheng Feng
Errui Ding
Yifan Sun
Xinyan Xiao
Jingdong Wang
DiffM
26
17
0
24 Sep 2024
MaskBit: Embedding-free Image Generation via Bit Tokens
MaskBit: Embedding-free Image Generation via Bit Tokens
Mark Weber
Lijun Yu
Qihang Yu
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
DiffM
41
27
0
24 Sep 2024
DepthART: Monocular Depth Estimation as Autoregressive Refinement Task
DepthART: Monocular Depth Estimation as Autoregressive Refinement Task
Bulat Gabdullin
Nina Konovalova
Nikolay Patakin
Dmitry Senushkin
Anton Konushin
MDE
20
0
0
23 Sep 2024
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Hong Chen
Xin Wang
Yuwei Zhou
Bin Huang
Yipeng Zhang
Wei Feng
Houlun Chen
Zeyang Zhang
Siao Tang
Wenwu Zhu
DiffM
44
7
0
23 Sep 2024
Temporally Aligned Audio for Video with Autoregression
Temporally Aligned Audio for Video with Autoregression
Ilpo Viertola
Vladimir E. Iashin
Esa Rahtu
VGen
16
9
0
20 Sep 2024
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Zhuoyan Luo
Fengyuan Shi
Yixiao Ge
Yujiu Yang
Limin Wang
Ying Shan
VLM
29
50
0
06 Sep 2024
OccLLaMA: An Occupancy-Language-Action Generative World Model for
  Autonomous Driving
OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving
Julong Wei
Shanshuai Yuan
Pengfei Li
Qingda Hu
Zhongxue Gan
Wenchao Ding
VLM
14
17
0
05 Sep 2024
Show-o: One Single Transformer to Unify Multimodal Understanding and
  Generation
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Jinheng Xie
Weijia Mao
Zechen Bai
David Junhao Zhang
Weihao Wang
Kevin Qinghong Lin
Yuchao Gu
Zhijie Chen
Zhenheng Yang
Mike Zheng Shou
33
159
0
22 Aug 2024
Scalable Autoregressive Image Generation with Mamba
Scalable Autoregressive Image Generation with Mamba
Haopeng Li
Jinyue Yang
Kexin Wang
Xuerui Qiu
Yuhong Chou
Xin Li
Guoqi Li
Mamba
37
12
0
22 Aug 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Dongyang Liu
Shitian Zhao
Le Zhuo
Weifeng Lin
Yu Qiao
Xinyue Li
Qi Qin
Yu Qiao
Hongsheng Li
Peng Gao
MLLM
54
48
0
05 Aug 2024
VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling
VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling
Qian Zhang
Xiangzi Dai
Ninghua Yang
Xiang An
Ziyong Feng
Xingyu Ren
VLM
CLIP
30
17
0
02 Aug 2024
Scaling Diffusion Transformers to 16 Billion Parameters
Scaling Diffusion Transformers to 16 Billion Parameters
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Debang Li
Junshi Huang
DiffM
MoE
54
15
0
16 Jul 2024
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Junke Wang
Yi-Xin Jiang
Zehuan Yuan
Binyue Peng
Zuxuan Wu
Yu-Gang Jiang
ViT
VGen
75
34
0
13 Jun 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
...
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
LRM
ALM
133
298
0
05 Jan 2024
Muse: Text-To-Image Generation via Masked Generative Transformers
Muse: Text-To-Image Generation via Masked Generative Transformers
Huiwen Chang
Han Zhang
Jarred Barber
AJ Maschinot
José Lezama
...
Kevin Patrick Murphy
William T. Freeman
Michael Rubinstein
Yuanzhen Li
Dilip Krishnan
DiffM
197
515
0
02 Jan 2023
Revisiting Neural Scaling Laws in Language and Vision
Revisiting Neural Scaling Laws in Language and Vision
Ibrahim M. Alabdulmohsin
Behnam Neyshabur
Xiaohua Zhai
145
101
0
13 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Autoregressive Image Generation using Residual Quantization
Autoregressive Image Generation using Residual Quantization
Doyup Lee
Chiheon Kim
Saehoon Kim
Minsu Cho
Wook-Shin Han
VGen
159
324
0
03 Mar 2022
StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
Axel Sauer
Katja Schwarz
Andreas Geiger
180
485
0
01 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
380
4,010
0
28 Jan 2022
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networks
Tero Karras
S. Laine
Timo Aila
262
10,183
0
12 Dec 2018
Image-to-Image Translation with Conditional Adversarial Networks
Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola
Jun-Yan Zhu
Tinghui Zhou
Alexei A. Efros
SSeg
203
19,191
0
21 Nov 2016
Previous
1234