ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.10789
  4. Cited By
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

22 June 2022
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
Zirui Wang
Vijay Vasudevan
Alexander Ku
Yinfei Yang
Burcu Karagol Ayan
Ben Hutchinson
Wei Han
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
    EGVM
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)

Papers citing "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"

50 / 1,010 papers shown
X&Fuse: Fusing Visual Information in Text-to-Image Generation
X&Fuse: Fusing Visual Information in Text-to-Image Generation
Yuval Kirstain
Omer Levy
Adam Polyak
DiffM
98
6
0
02 Mar 2023
Understanding Diffusion Objectives as the ELBO with Simple Data
  Augmentation
Understanding Diffusion Objectives as the ELBO with Simple Data AugmentationNeural Information Processing Systems (NeurIPS), 2023
Diederik P. Kingma
Ruiqi Gao
DiffM
765
238
0
01 Mar 2023
StraIT: Non-autoregressive Generation with Stratified Image Transformer
StraIT: Non-autoregressive Generation with Stratified Image Transformer
Shengju Qian
Huiwen Chang
Yuanzhen Li
Zizhao Zhang
Jiaya Jia
Han Zhang
221
13
0
01 Mar 2023
OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge
  Collaborative AutoML System
OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System
Chao Xue
Wen Liu
Shunxing Xie
Zhenfang Wang
Jiaxing Li
...
Shi-Yong Chen
Yibing Zhan
Jing Zhang
Chaoyue Wang
Dacheng Tao
232
3
0
01 Mar 2023
Benchmarking Deepart Detection
Benchmarking Deepart Detection
Yabin Wang
Zhiwu Huang
Xiaopeng Hong
200
14
0
28 Feb 2023
Enhanced Controllability of Diffusion Models via Feature Disentanglement and Realism-Enhanced Sampling Methods
Enhanced Controllability of Diffusion Models via Feature Disentanglement and Realism-Enhanced Sampling MethodsEuropean Conference on Computer Vision (ECCV), 2023
Wonwoong Cho
Hareesh Ravi
Midhun Harikumar
V. Khuc
Krishna Kumar Singh
Jingwan Lu
David I. Inouye
Ajinkya Kale
DiffM
528
8
0
28 Feb 2023
ELITE: Encoding Visual Concepts into Textual Embeddings for Customized
  Text-to-Image Generation
ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image GenerationIEEE International Conference on Computer Vision (ICCV), 2023
Yuxiang Wei
Yabo Zhang
Zhilong Ji
Jinfeng Bai
Lei Zhang
W. Zuo
DiffM
303
429
0
27 Feb 2023
Encoder-based Domain Tuning for Fast Personalization of Text-to-Image
  Models
Encoder-based Domain Tuning for Fast Personalization of Text-to-Image ModelsACM Transactions on Graphics (TOG), 2023
Rinon Gal
Moab Arar
Yuval Atzmon
Amit H. Bermano
Gal Chechik
Daniel Cohen-Or
DiffM
459
238
0
23 Feb 2023
Aligning Text-to-Image Models using Human Feedback
Aligning Text-to-Image Models using Human Feedback
Kimin Lee
Hao Liu
Moonkyung Ryu
Olivia Watkins
Yuqing Du
Craig Boutilier
Pieter Abbeel
Mohammad Ghavamzadeh
S. Gu
EGVM
339
385
0
23 Feb 2023
Teaching CLIP to Count to Ten
Teaching CLIP to Count to TenIEEE International Conference on Computer Vision (ICCV), 2023
Roni Paiss
Ariel Ephrat
Omer Tov
Shiran Zada
Inbar Mosseri
Michal Irani
Tali Dekel
VLMCLIP
472
161
0
23 Feb 2023
Controlled and Conditional Text to Image Generation with Diffusion Prior
Controlled and Conditional Text to Image Generation with Diffusion Prior
Pranav Aggarwal
Hareesh Ravi
Naveen Marri
Sachin Kelkar
F. Chen
...
Alvin Ghouas
Sarah Saber
Malavika Ramprasad
Baldo Faieta
Ajinkya Kale
DiffM
270
7
0
23 Feb 2023
Learning 3D Photography Videos via Self-supervised Diffusion on Single
  Images
Learning 3D Photography Videos via Self-supervised Diffusion on Single ImagesInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Xiaodong Wang
Chenfei Wu
S. Yin
Minheng Ni
Jianfeng Wang
...
Fan Yang
Lijuan Wang
Zicheng Liu
Yuejian Fang
Nan Duan
VGenDiffM
196
11
0
21 Feb 2023
Composer: Creative and Controllable Image Synthesis with Composable
  Conditions
Composer: Creative and Controllable Image Synthesis with Composable ConditionsInternational Conference on Machine Learning (ICML), 2023
Lianghua Huang
Di Chen
Yu Liu
Yujun Shen
Deli Zhao
Jingren Zhou
DiffM
423
355
0
20 Feb 2023
Grimm in Wonderland: Prompt Engineering with Midjourney to Illustrate
  Fairytales
Grimm in Wonderland: Prompt Engineering with Midjourney to Illustrate FairytalesItalian Research Conference on Digital Library Management Systems (IRCDL), 2023
Martin Ruskov
DiffM
220
24
0
17 Feb 2023
Text-driven Visual Synthesis with Latent Diffusion Prior
Text-driven Visual Synthesis with Latent Diffusion Prior
Tingbo Liao
Songwei Ge
Yiran Xu
Yao-Chih Lee
Badour Albahar
Jia-Bin Huang
DiffM
224
6
0
16 Feb 2023
Exploring the Representation Manifolds of Stable Diffusion Through the
  Lens of Intrinsic Dimension
Exploring the Representation Manifolds of Stable Diffusion Through the Lens of Intrinsic Dimension
Henry Kvinge
Davis Brown
Charles Godfrey
DiffM
156
6
0
16 Feb 2023
DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization
DIFUSCO: Graph-based Diffusion Solvers for Combinatorial OptimizationNeural Information Processing Systems (NeurIPS), 2023
Zhiqing Sun
Yiming Yang
DiffM
321
209
0
16 Feb 2023
PRedItOR: Text Guided Image Editing with Diffusion Prior
PRedItOR: Text Guided Image Editing with Diffusion Prior
Hareesh Ravi
Sachin Kelkar
Midhun Harikumar
Ajinkya Kale
DiffM
251
14
0
15 Feb 2023
Self-Organising Neural Discrete Representation Learning à la Kohonen
Self-Organising Neural Discrete Representation Learning à la KohonenInternational Conference on Artificial Neural Networks (ICANN), 2023
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
SSL
310
1
0
15 Feb 2023
VQ3D: Learning a 3D-Aware Generative Model on ImageNet
VQ3D: Learning a 3D-Aware Generative Model on ImageNetIEEE International Conference on Computer Vision (ICCV), 2023
Kyle Sargent
Jing Yu Koh
Han Zhang
Huiwen Chang
Charles Herrmann
Pratul P. Srinivasan
Jiajun Wu
Deqing Sun
207
32
0
14 Feb 2023
Multi-modal Machine Learning in Engineering Design: A Review and Future
  Directions
Multi-modal Machine Learning in Engineering Design: A Review and Future DirectionsJournal of Computing and Information Science in Engineering (JCISE), 2023
Binyang Song
Ruilin Zhou
Faez Ahmed
AI4CE
356
64
0
14 Feb 2023
From paintbrush to pixel: A review of deep neural networks in AI-generated art
From paintbrush to pixel: A review of deep neural networks in AI-generated art
Anne-Sofie Maerten
Derya Soydaner
280
30
0
14 Feb 2023
MaskSketch: Unpaired Structure-guided Masked Image Generation
MaskSketch: Unpaired Structure-guided Masked Image GenerationComputer Vision and Pattern Recognition (CVPR), 2023
D. Bashkirova
José Lezama
Kihyuk Sohn
Kate Saenko
Irfan Essa
DiffM
204
36
0
10 Feb 2023
Scaling Vision Transformers to 22 Billion Parameters
Scaling Vision Transformers to 22 Billion ParametersInternational Conference on Machine Learning (ICML), 2023
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
...
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
407
774
0
10 Feb 2023
Noise2Music: Text-conditioned Music Generation with Diffusion Models
Noise2Music: Text-conditioned Music Generation with Diffusion Models
Qingqing Huang
Daniel S. Park
Tao Wang
Timo I. Denk
Andy Ly
...
Jesse Engel
Quoc V. Le
William Chan
Zhifeng Chen
Wei Han
MGenDiffM
353
244
0
08 Feb 2023
Zero-shot Generation of Coherent Storybook from Plain Text Story using
  Diffusion Models
Zero-shot Generation of Coherent Storybook from Plain Text Story using Diffusion Models
Hyeonho Jeong
Gihyun Kwon
Jong Chul Ye
172
30
0
08 Feb 2023
Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness
Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness
Felix Friedrich
Manuel Brack
Lukas Struppek
Dominik Hintersdorf
P. Schramowski
Sasha Luccioni
Kristian Kersting
299
169
0
07 Feb 2023
Zero-shot Image-to-Image Translation
Zero-shot Image-to-Image TranslationInternational Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), 2023
Gaurav Parmar
Krishna Kumar Singh
Richard Y. Zhang
Yijun Li
Jingwan Lu
Jun-Yan Zhu
DiffM
309
561
0
06 Feb 2023
Structure and Content-Guided Video Synthesis with Diffusion Models
Structure and Content-Guided Video Synthesis with Diffusion ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Patrick Esser
Johnathan Chiu
Parmida Atighehchian
Jonathan Granskog
Anastasis Germanidis
DiffMVGen
380
665
0
06 Feb 2023
Eliminating Contextual Prior Bias for Semantic Image Editing via
  Dual-Cycle Diffusion
Eliminating Contextual Prior Bias for Semantic Image Editing via Dual-Cycle Diffusion
Zuopeng Yang
Tianshu Chu
Xin Lin
Erdun Gao
Daqing Liu
J. Yang
Chaoyue Wang
DiffM
230
32
0
05 Feb 2023
Dreamix: Video Diffusion Models are General Video Editors
Dreamix: Video Diffusion Models are General Video Editors
Eyal Molad
Eliahu Horwitz
Dani Valevski
Alex Rav-Acha
Yossi Matias
Yael Pritch
Yaniv Leviathan
Yedid Hoshen
DiffMVGen
311
216
0
02 Feb 2023
Language Quantized AutoEncoders: Towards Unsupervised Text-Image
  Alignment
Language Quantized AutoEncoders: Towards Unsupervised Text-Image AlignmentNeural Information Processing Systems (NeurIPS), 2023
Hao Liu
Wilson Yan
Pieter Abbeel
254
34
0
02 Feb 2023
Grounding Language Models to Images for Multimodal Inputs and Outputs
Grounding Language Models to Images for Multimodal Inputs and OutputsInternational Conference on Machine Learning (ICML), 2023
Jing Yu Koh
Ruslan Salakhutdinov
Daniel Fried
MLLM
444
150
0
31 Jan 2023
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image
  Diffusion Models
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion ModelsACM Transactions on Graphics (TOG), 2023
Hila Chefer
Yuval Alaluf
Yael Vinker
Lior Wolf
Daniel Cohen-Or
DiffM
582
669
0
31 Jan 2023
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
GALIP: Generative Adversarial CLIPs for Text-to-Image SynthesisComputer Vision and Pattern Recognition (CVPR), 2023
Ming Tao
Bingkun Bao
Hao Tang
Changsheng Xu
DiffMVLM
239
137
0
30 Jan 2023
MusicLM: Generating Music From Text
MusicLM: Generating Music From Text
A. Agostinelli
Timo I. Denk
Zalan Borsos
Jesse Engel
Mauro Verzetti
...
Adam Roberts
Marco Tagliasacchi
Matthew Sharifi
Neil Zeghidour
Christian Frank
MGen
747
601
0
26 Jan 2023
Text-To-4D Dynamic Scene Generation
Text-To-4D Dynamic Scene GenerationInternational Conference on Machine Learning (ICML), 2023
Uriel Singer
Shelly Sheynin
Adam Polyak
Oron Ashual
Iurii Makarov
...
Naman Goyal
Andrea Vedaldi
Devi Parikh
Justin Johnson
Yaniv Taigman
DiffM
228
209
0
26 Jan 2023
Simple diffusion: End-to-end diffusion for high resolution images
Simple diffusion: End-to-end diffusion for high resolution imagesInternational Conference on Machine Learning (ICML), 2023
Emiel Hoogeboom
Jonathan Heek
Tim Salimans
400
352
0
26 Jan 2023
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale
  Text-to-Image Synthesis
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image SynthesisInternational Conference on Machine Learning (ICML), 2023
Axel Sauer
Tero Karras
S. Laine
Andreas Geiger
Timo Aila
325
266
0
23 Jan 2023
Regeneration Learning: A Learning Paradigm for Data Generation
Regeneration Learning: A Learning Paradigm for Data GenerationAAAI Conference on Artificial Intelligence (AAAI), 2023
Xu Tan
Tao Qin
Jiang Bian
Tie-Yan Liu
Yoshua Bengio
GAN
146
19
0
21 Jan 2023
GLIGEN: Open-Set Grounded Text-to-Image Generation
GLIGEN: Open-Set Grounded Text-to-Image GenerationComputer Vision and Pattern Recognition (CVPR), 2023
Yuheng Li
Haotian Liu
Qingyang Wu
Fangzhou Mu
Jianwei Yang
Jianfeng Gao
Chunyuan Li
Yong Jae Lee
VLM
436
800
1
17 Jan 2023
Open-vocabulary Object Segmentation with Diffusion Models
Open-vocabulary Object Segmentation with Diffusion ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Ziyi Li
Qinye Zhou
Xiaoyun Zhang
Ya Zhang
Yanfeng Wang
Weidi Xie
VLM
331
89
0
12 Jan 2023
Latent Autoregressive Source Separation
Latent Autoregressive Source SeparationAAAI Conference on Artificial Intelligence (AAAI), 2023
Emilian Postolache
Giorgio Mariani
Michele Mancusi
Andrea Santilli
Luca Cosmo
Emanuele Rodolà
BDLDRL
173
12
0
09 Jan 2023
MAQA: A Multimodal QA Benchmark for Negation
MAQA: A Multimodal QA Benchmark for Negation
Judith Yue Li
Aren Jansen
Qingqing Huang
Joonseok Lee
Ravi Ganti
Dima Kuzmin
210
6
0
09 Jan 2023
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
Self-Supervised Video Forensics by Audio-Visual Anomaly DetectionComputer Vision and Pattern Recognition (CVPR), 2023
Chao Feng
Ziyang Chen
Andrew Owens
272
112
0
04 Jan 2023
Attribute-Centric Compositional Text-to-Image Generation
Attribute-Centric Compositional Text-to-Image GenerationInternational Journal of Computer Vision (IJCV), 2023
Yuren Cong
Martin Renqiang Min
Erran L. Li
Bodo Rosenhahn
M. Yang
245
18
0
04 Jan 2023
Muse: Text-To-Image Generation via Masked Generative Transformers
Muse: Text-To-Image Generation via Masked Generative TransformersInternational Conference on Machine Learning (ICML), 2023
Huiwen Chang
Han Zhang
Jarred Barber
AJ Maschinot
José Lezama
...
Kevin Patrick Murphy
William T. Freeman
Michael Rubinstein
Yuanzhen Li
Dilip Krishnan
DiffM
488
697
0
02 Jan 2023
Multi-Realism Image Compression with a Conditional Generator
Multi-Realism Image Compression with a Conditional GeneratorComputer Vision and Pattern Recognition (CVPR), 2022
E. Agustsson
David C. Minnen
G. Toderici
Fabian Mentzer
254
95
0
28 Dec 2022
Do DALL-E and Flamingo Understand Each Other?
Do DALL-E and Flamingo Understand Each Other?IEEE International Conference on Computer Vision (ICCV), 2022
Hang Li
Jindong Gu
Rajat Koner
Sahand Sharifzadeh
Volker Tresp
MLLM
226
14
0
23 Dec 2022
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for
  Text-to-Video Generation
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video GenerationIEEE International Conference on Computer Vision (ICCV), 2022
Jay Zhangjie Wu
Yixiao Ge
Xintao Wang
Weixian Lei
Yuchao Gu
Yufei Shi
Wynne Hsu
Ying Shan
Xiaohu Qie
Mike Zheng Shou
VGen
352
1,011
0
22 Dec 2022
Previous
123...1718192021
Next