ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.10789
  4. Cited By
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

22 June 2022
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
Zirui Wang
Vijay Vasudevan
Alexander Ku
Yinfei Yang
Burcu Karagol Ayan
Ben Hutchinson
Wei Han
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
    EGVM
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)

Papers citing "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"

50 / 1,010 papers shown
Lafite2: Few-shot Text-to-Image Generation
Lafite2: Few-shot Text-to-Image Generation
Jiuxiang Gu
Chunyuan Li
Changyou Chen
Jianfeng Gao
Jinhui Xu
DiffM
192
14
0
25 Oct 2022
Vitruvio: 3D Building Meshes via Single Perspective Sketches
Vitruvio: 3D Building Meshes via Single Perspective Sketches
Alberto Tono
Heyaojing Huang
Ashwin Agrawal
Martin Fischer
265
6
0
24 Oct 2022
Instance-Aware Image Completion
Instance-Aware Image Completion
Ji-Ho Cho
Minguk Kang
Vibhav Vineet
Jaesik Park
ISegVLM
191
2
0
22 Oct 2022
SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity
  Representation
SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity RepresentationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zekun Li
Jina Kim
Yao-Yi Chiang
Muhao Chen
258
43
0
21 Oct 2022
3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows
3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows
Vivian Liu
Jo Vermeulen
G. Fitzmaurice
Justin Matejka
HAI
277
155
0
20 Oct 2022
Composing Ensembles of Pre-trained Models via Iterative Consensus
Composing Ensembles of Pre-trained Models via Iterative ConsensusInternational Conference on Learning Representations (ICLR), 2022
Shuang Li
Yilun Du
J. Tenenbaum
Antonio Torralba
Igor Mordatch
MoMe
160
31
0
20 Oct 2022
Transcending Scaling Laws with 0.1% Extra Compute
Transcending Scaling Laws with 0.1% Extra ComputeConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yi Tay
Jason W. Wei
Hyung Won Chung
Vinh Q. Tran
David R. So
...
Donald Metzler
Slav Petrov
N. Houlsby
Quoc V. Le
Mostafa Dehghani
LRM
312
73
0
20 Oct 2022
OCR-VQGAN: Taming Text-within-Image Generation
OCR-VQGAN: Taming Text-within-Image GenerationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Juan A. Rodriguez
David Vazquez
I. Laradji
M. Pedersoli
Pau Rodríguez López
266
30
0
19 Oct 2022
Optimizing Hierarchical Image VAEs for Sample Quality
Optimizing Hierarchical Image VAEs for Sample Quality
Eric Luhman
Troy Luhman
DRL
173
5
0
18 Oct 2022
Large-scale Text-to-Image Generation Models for Visual Artists' Creative
  Works
Large-scale Text-to-Image Generation Models for Visual Artists' Creative WorksInternational Conference on Intelligent User Interfaces (IUI), 2022
Hyung-Kwon Ko
Gwanmo Park
Hyeon Jeon
Jaemin Jo
Juho Kim
Jinwook Seo
477
189
0
16 Oct 2022
LAION-5B: An open large-scale dataset for training next generation
  image-text models
LAION-5B: An open large-scale dataset for training next generation image-text modelsNeural Information Processing Systems (NeurIPS), 2022
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLMMLLMCLIP
890
4,531
0
16 Oct 2022
DE-FAKE: Detection and Attribution of Fake Images Generated by
  Text-to-Image Generation Models
DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation ModelsConference on Computer and Communications Security (CCS), 2022
Zeyang Sha
Zheng Li
Ning Yu
Yang Zhang
DiffM
213
199
0
13 Oct 2022
Underspecification in Scene Description-to-Depiction Tasks
Underspecification in Scene Description-to-Depiction Tasks
Ben Hutchinson
Jason Baldridge
Vinodkumar Prabhakaran
DiffM
218
38
0
11 Oct 2022
Markup-to-Image Diffusion Models with Scheduled Sampling
Markup-to-Image Diffusion Models with Scheduled SamplingInternational Conference on Learning Representations (ICLR), 2022
Yuntian Deng
Noriyuki Kojima
Alexander M. Rush
DiffM
186
6
0
11 Oct 2022
Can Artificial Intelligence Reconstruct Ancient Mosaics?
Can Artificial Intelligence Reconstruct Ancient Mosaics?Studies in Conservation (SIC), 2022
Fernando Moral-Andrés
Elena Merino-Gómez
Pedro Reviriego
Fabrizio Lombardi
90
9
0
07 Oct 2022
On Distillation of Guided Diffusion Models
On Distillation of Guided Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2022
Chenlin Meng
Robin Rombach
Ruiqi Gao
Diederik P. Kingma
Stefano Ermon
Jonathan Ho
Tim Salimans
VLMDiffM
249
697
0
06 Oct 2022
A New Path: Scaling Vision-and-Language Navigation with Synthetic
  Instructions and Imitation Learning
A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation LearningComputer Vision and Pattern Recognition (CVPR), 2022
Aishwarya Kamath
Peter Anderson
Su Wang
Jing Yu Koh
Alexander Ku
Austin Waters
Yinfei Yang
Jason Baldridge
Zarana Parekh
LM&Ro
415
61
0
06 Oct 2022
Phenaki: Variable Length Video Generation From Open Domain Textual
  Description
Phenaki: Variable Length Video Generation From Open Domain Textual DescriptionInternational Conference on Learning Representations (ICLR), 2022
Ruben Villegas
Mohammad Babaeizadeh
Pieter-Jan Kindermans
Hernan Moraldo
Han Zhang
M. Saffar
Santiago Castro
Julius Kunze
D. Erhan
DiffMVGen
362
486
0
05 Oct 2022
Imagen Video: High Definition Video Generation with Diffusion Models
Imagen Video: High Definition Video Generation with Diffusion Models
Jonathan Ho
William Chan
Chitwan Saharia
Jay Whang
Ruiqi Gao
...
Diederik P. Kingma
Ben Poole
Mohammad Norouzi
David J. Fleet
Tim Salimans
VGen
441
1,862
0
05 Oct 2022
Progressive Text-to-Image Generation
Progressive Text-to-Image Generation
Zhengcong Fei
Mingyuan Fan
Li Zhu
Junshi Huang
301
4
0
05 Oct 2022
Visual Prompt Tuning for Generative Transfer Learning
Visual Prompt Tuning for Generative Transfer LearningComputer Vision and Pattern Recognition (CVPR), 2022
Kihyuk Sohn
Yuan Hao
José Lezama
Luisa F. Polanía
Huiwen Chang
Han Zhang
Irfan Essa
Lu Jiang
VPVLMVLM
324
105
0
03 Oct 2022
Membership Inference Attacks Against Text-to-image Generation Models
Membership Inference Attacks Against Text-to-image Generation Models
Yixin Wu
Ning Yu
Zheng Li
Michael Backes
Yang Zhang
DiffM
197
79
0
03 Oct 2022
AudioGen: Textually Guided Audio Generation
AudioGen: Textually Guided Audio GenerationInternational Conference on Learning Representations (ICLR), 2022
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
DiffM
410
392
0
30 Sep 2022
Understanding Pure CLIP Guidance for Voxel Grid NeRF Models
Understanding Pure CLIP Guidance for Voxel Grid NeRF Models
Han-Hung Lee
Angel X. Chang
148
68
0
30 Sep 2022
DreamFusion: Text-to-3D using 2D Diffusion
DreamFusion: Text-to-3D using 2D DiffusionInternational Conference on Learning Representations (ICLR), 2022
Ben Poole
Ajay Jain
Jonathan T. Barron
B. Mildenhall
879
3,151
0
29 Sep 2022
Make-A-Video: Text-to-Video Generation without Text-Video Data
Make-A-Video: Text-to-Video Generation without Text-Video DataInternational Conference on Learning Representations (ICLR), 2022
Uriel Singer
Adam Polyak
Thomas Hayes
Xiaoyue Yin
Jie An
...
Oron Ashual
Oran Gafni
Devi Parikh
Sonal Gupta
Yaniv Taigman
DiffMVGen
298
1,795
0
29 Sep 2022
Re-Imagen: Retrieval-Augmented Text-to-Image Generator
Re-Imagen: Retrieval-Augmented Text-to-Image GeneratorInternational Conference on Learning Representations (ICLR), 2022
Wenhu Chen
Hexiang Hu
Chitwan Saharia
William W. Cohen
VLM
568
230
0
29 Sep 2022
Learning to Learn with Generative Models of Neural Network Checkpoints
Learning to Learn with Generative Models of Neural Network Checkpoints
William S. Peebles
Ilija Radosavovic
Tim Brooks
Alexei A. Efros
Jitendra Malik
UQCV
272
83
0
26 Sep 2022
All are Worth Words: A ViT Backbone for Diffusion Models
All are Worth Words: A ViT Backbone for Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2022
Fan Bao
Shen Nie
Kaiwen Xue
Yue Cao
Chongxuan Li
Hang Su
Jun Zhu
VLM
553
499
0
25 Sep 2022
Extremely Simple Activation Shaping for Out-of-Distribution Detection
Extremely Simple Activation Shaping for Out-of-Distribution DetectionInternational Conference on Learning Representations (ICLR), 2022
Andrija Djurisic
Nebojsa Bozanic
Arjun Ashok
Rosanne Liu
OODD
412
201
0
20 Sep 2022
Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis
Exploiting Cultural Biases via Homoglyphs in Text-to-Image SynthesisJournal of Artificial Intelligence Research (JAIR), 2022
Lukas Struppek
Dominik Hintersdorf
Felix Friedrich
Manuel Brack
P. Schramowski
Kristian Kersting
389
41
0
19 Sep 2022
Does CLIP Know My Face?
Does CLIP Know My Face?Journal of Artificial Intelligence Research (JAIR), 2022
Dominik Hintersdorf
Lukas Struppek
Manuel Brack
Felix Friedrich
P. Schramowski
Kristian Kersting
VLM
261
17
0
15 Sep 2022
AudioLM: a Language Modeling Approach to Audio Generation
AudioLM: a Language Modeling Approach to Audio GenerationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
...
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
AuLLM
397
819
0
07 Sep 2022
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for
  Subject-Driven Generation
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven GenerationComputer Vision and Pattern Recognition (CVPR), 2022
Nataniel Ruiz
Yuanzhen Li
Varun Jampani
Yael Pritch
Michael Rubinstein
Kfir Aberman
1.0K
3,756
0
25 Aug 2022
Text to Image Generation: Leaving no Language Behind
Text to Image Generation: Leaving no Language Behind
Pedro Reviriego
Elena Merino-Gómez
VLM
131
15
0
19 Aug 2022
Finding Reusable Machine Learning Components to Build Programming
  Language Processing Pipelines
Finding Reusable Machine Learning Components to Build Programming Language Processing PipelinesEuropean Conference on Software Architecture (ECSA), 2022
Patrick Flynn
T. Vanderbruggen
C. Liao
Pei-Hung Lin
M. Emani
Xipeng Shen
210
5
0
11 Aug 2022
Quality Not Quantity: On the Interaction between Dataset Design and
  Robustness of CLIP
Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIPNeural Information Processing Systems (NeurIPS), 2022
Thao Nguyen
Gabriel Ilharco
Mitchell Wortsman
Sewoong Oh
Ludwig Schmidt
CLIPVLM
569
122
0
10 Aug 2022
Adversarial Attacks on Image Generation With Made-Up Words
Adversarial Attacks on Image Generation With Made-Up Words
Raphael Milliere
228
42
0
04 Aug 2022
DALLE-URBAN: Capturing the urban design expertise of large text to image
  transformers
DALLE-URBAN: Capturing the urban design expertise of large text to image transformers
Sachith Seneviratne
Damith A. Senanayake
Sanka Rasnayaka
Rajith Vidanaarachchi
Jason Thompson
ViT
258
28
0
03 Aug 2022
Prompt-to-Prompt Image Editing with Cross Attention Control
Prompt-to-Prompt Image Editing with Cross Attention ControlInternational Conference on Learning Representations (ICLR), 2022
Amir Hertz
Ron Mokady
J. Tenenbaum
Kfir Aberman
Yael Pritch
Daniel Cohen-Or
DiffM
719
2,333
0
02 Aug 2022
An Image is Worth One Word: Personalizing Text-to-Image Generation using
  Textual Inversion
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual InversionInternational Conference on Learning Representations (ICLR), 2022
Rinon Gal
Yuval Alaluf
Yuval Atzmon
Or Patashnik
Amit H. Bermano
Gal Chechik
Daniel Cohen-Or
498
2,443
0
02 Aug 2022
Lighting (In)consistency of Paint by Text
Lighting (In)consistency of Paint by Text
Hany Farid
160
39
0
27 Jul 2022
Text-Guided Synthesis of Artistic Images with Retrieval-Augmented
  Diffusion Models
Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models
Robin Rombach
A. Blattmann
Bjorn Ommer
DiffM
260
91
0
26 Jul 2022
NUWA-Infinity: Autoregressive over Autoregressive Generation for
  Infinite Visual Synthesis
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual SynthesisNeural Information Processing Systems (NeurIPS), 2022
Chenfei Wu
Jian Liang
Xiaowei Hu
Zhe Gan
Jianfeng Wang
Lijuan Wang
Zicheng Liu
Yuejian Fang
Nan Duan
VGen
223
95
0
20 Jul 2022
Perspective (In)consistency of Paint by Text
Perspective (In)consistency of Paint by Text
Hany Farid
DiffM
202
43
0
27 Jun 2022
Worldwide AI Ethics: a review of 200 guidelines and recommendations for
  AI governance
Worldwide AI Ethics: a review of 200 guidelines and recommendations for AI governancePatterns (Patterns), 2022
N. Corrêa
Camila Galvão
J. Santos
C. Pino
Edson Pontes Pinto
...
Diogo Massmann
Rodrigo Mambrini
Luiza Galvao
Edmund Terem
Nythamar Fernandes de Oliveira
450
181
0
23 Jun 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal TasksInternational Conference on Learning Representations (ICLR), 2022
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjDVLMMLLM
469
473
0
17 Jun 2022
Write and Paint: Generative Vision-Language Models are Unified Modal
  Learners
Write and Paint: Generative Vision-Language Models are Unified Modal LearnersInternational Conference on Learning Representations (ICLR), 2022
Shizhe Diao
Wangchunshu Zhou
Xinsong Zhang
Jiawei Wang
MLLMAI4CE
294
19
0
15 Jun 2022
Blended Latent Diffusion
Blended Latent DiffusionACM Transactions on Graphics (TOG), 2022
Omri Avrahami
Ohad Fried
Dani Lischinski
DiffM
373
490
0
06 Jun 2022
Parallel Synthesis for Autoregressive Speech Generation
Parallel Synthesis for Autoregressive Speech GenerationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Po-Chun Hsu
Da-Rong Liu
Andy T. Liu
Hung-yi Lee
270
6
0
25 Apr 2022
Previous
123...192021
Next