ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.10789
  4. Cited By
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

22 June 2022
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
Zirui Wang
Vijay Vasudevan
Alexander Ku
Yinfei Yang
Burcu Karagol Ayan
Ben Hutchinson
Wei Han
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
    EGVM
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)

Papers citing "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"

50 / 1,011 papers shown
Looking at words and points with attention: a benchmark for
  text-to-shape coherence
Looking at words and points with attention: a benchmark for text-to-shape coherence
Andrea Amaduzzi
Giuseppe Lisanti
Samuele Salti
Luigi Di Stefano
149
3
0
14 Sep 2023
Masked Generative Modeling with Enhanced Sampling Scheme
Masked Generative Modeling with Enhanced Sampling Scheme
Daesoo Lee
Erlend Aune
Sara Malacarne
DiffM
193
4
0
14 Sep 2023
InstaFlow: One Step is Enough for High-Quality Diffusion-Based
  Text-to-Image Generation
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image GenerationInternational Conference on Learning Representations (ICLR), 2023
Xingchao Liu
Xiwen Zhang
Jianzhu Ma
Jian Peng
Qiang Liu
595
312
0
12 Sep 2023
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion
  Models
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
Li Chen
Mengyi Zhao
Yiheng Liu
Mingxu Ding
Yangyang Song
...
Xu Wang
Hao Yang
Jing Liu
Kang Du
Min Zheng
DiffM
166
75
0
11 Sep 2023
ITI-GEN: Inclusive Text-to-Image Generation
ITI-GEN: Inclusive Text-to-Image GenerationIEEE International Conference on Computer Vision (ICCV), 2023
Cheng Zhang
Xuanbai Chen
Siqi Chai
Chen Henry Wu
Dmitry Lagun
Thabo Beeler
Fernando de la Torre
VLM
251
79
0
11 Sep 2023
NExT-GPT: Any-to-Any Multimodal LLM
NExT-GPT: Any-to-Any Multimodal LLMInternational Conference on Machine Learning (ICML), 2023
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
MLLM
379
717
0
11 Sep 2023
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual
  Tokenization
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual TokenizationInternational Conference on Learning Representations (ICLR), 2023
Yang Jin
Kun Xu
Kun Xu
Liwei Chen
Chao Liao
...
Xiaoqiang Lei
Chen Zhang
Wenwu Ou
Kun Gai
Yadong Mu
MLLMVLM
240
76
0
09 Sep 2023
MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask
MaskDiffusion: Boosting Text-to-Image Consistency with Conditional MaskInternational Journal of Computer Vision (IJCV), 2023
Yupeng Zhou
Daquan Zhou
Zuo-Liang Zhu
Yaxing Wang
Qibin Hou
Jiashi Feng
171
13
0
08 Sep 2023
Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis
Exploring Sparse MoE in GANs for Text-conditioned Image SynthesisComputer Vision and Pattern Recognition (CVPR), 2023
Jiapeng Zhu
Ceyuan Yang
Kecheng Zheng
Yinghao Xu
Zifan Shi
Yujun Shen
MoE
262
14
0
07 Sep 2023
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
Jiaxi Gu
Shicong Wang
Haoyu Zhao
Tianyi Lu
Xing Zhang
Zuxuan Wu
Songcen Xu
Wei Zhang
Yu-Gang Jiang
Hang Xu
DiffMVGen
224
57
0
07 Sep 2023
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction
  Tuning
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
L. Yu
Bowen Shi
Ramakanth Pasunuru
Benjamin Muller
O. Yu. Golovneva
...
Yaniv Taigman
Maryam Fazel-Zarandi
Asli Celikyilmaz
Luke Zettlemoyer
Armen Aghajanyan
MLLM
277
164
0
05 Sep 2023
Breaking Barriers to Creative Expression: Co-Designing and Implementing
  an Accessible Text-to-Image Interface
Breaking Barriers to Creative Expression: Co-Designing and Implementing an Accessible Text-to-Image Interface
Atieh Taheri
Mohammad Izadi
Gururaj Shriram
Negar Rostamzadeh
Shaun Kane
DiffM
168
3
0
05 Sep 2023
MAGMA: Music Aligned Generative Motion Autodecoder
MAGMA: Music Aligned Generative Motion Autodecoder
Sohan Anisetty
Amit Raj
James Hays
151
0
0
03 Sep 2023
Bridge Diffusion Model: bridge non-English language-native text-to-image
  diffusion model with English communities
Bridge Diffusion Model: bridge non-English language-native text-to-image diffusion model with English communities
Shanyuan Liu
Dawei Leng
Yuhui Yin
DiffM
140
9
0
02 Sep 2023
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
  Large Model
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large ModelIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Fengxiang Bie
Jianlong Wu
Zhongzhu Zhou
Adam Ghanem
Minjia Zhang
...
Pareesa Ameneh Golnari
David A. Clifton
Yuxiong He
Dacheng Tao
Shuaiwen Leon Song
EGVM
254
56
0
02 Sep 2023
VideoGen: A Reference-Guided Latent Diffusion Approach for High
  Definition Text-to-Video Generation
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation
Xin Li
Wenqing Chu
Ye Wu
Weihang Yuan
Fanglong Liu
Tao Gui
Fu Li
Haocheng Feng
Errui Ding
Jingdong Wang
VGen
332
70
0
01 Sep 2023
Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation
  Using only Images
Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only ImagesIEEE International Conference on Computer Vision (ICCV), 2023
Cuican Yu
Guansong Lu
Yihan Zeng
Jian Sun
Xiaodan Liang
Huibin Li
Zongben Xu
Songcen Xu
Wei Zhang
Hang Xu
231
19
0
31 Aug 2023
Priority-Centric Human Motion Generation in Discrete Latent Space
Priority-Centric Human Motion Generation in Discrete Latent SpaceIEEE International Conference on Computer Vision (ICCV), 2023
Hanyang Kong
Kehong Gong
Dongze Lian
Michael Bi Mi
Xinchao Wang
DiffM
448
69
0
28 Aug 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
AI-Generated Content (AIGC) for Various Data Modalities: A SurveyACM Computing Surveys (ACM Comput. Surv.), 2023
Lin Geng Foo
Hossein Rahmani
Jing Liu
760
47
0
27 Aug 2023
Dense Text-to-Image Generation with Attention Modulation
Dense Text-to-Image Generation with Attention ModulationIEEE International Conference on Computer Vision (ICCV), 2023
Yunji Kim
Jiyoung Lee
Jin-Hwa Kim
Jung-Woo Ha
Jun-Yan Zhu
DiffM
281
181
0
24 Aug 2023
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across
  Languages
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across LanguagesInternational Conference on Learning Representations (ICLR), 2023
Jinyi Hu
Yuan Yao
Chong Wang
Shanonan Wang
Yinxu Pan
...
Yankai Lin
Jiao Xue
Dahai Li
Zhiyuan Liu
Maosong Sun
MLLMVLM
276
77
0
23 Aug 2023
StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
StoryBench: A Multifaceted Benchmark for Continuous Story VisualizationNeural Information Processing Systems (NeurIPS), 2023
Emanuele Bugliarello
Hernan Moraldo
Ruben Villegas
Mohammad Babaeizadeh
M. Saffar
Han Zhang
D. Erhan
V. Ferrari
Pieter-Jan Kindermans
P. Voigtlaender
VGen
332
15
0
22 Aug 2023
DiffCloth: Diffusion Based Garment Synthesis and Manipulation via
  Structural Cross-modal Semantic Alignment
DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic AlignmentIEEE International Conference on Computer Vision (ICCV), 2023
Xujie Zhang
Binbin Yang
Michael C. Kampffmeyer
Wenqing Zhang
Shiyue Zhang
Guansong Lu
Liang Lin
Hang Xu
Xiaodan Liang
DiffM
396
17
0
22 Aug 2023
Backdooring Textual Inversion for Concept Censorship
Backdooring Textual Inversion for Concept Censorship
Yutong Wu
Jiehan Zhang
Florian Kerschbaum
Tianwei Zhang
DiffM
269
12
0
21 Aug 2023
SimDA: Simple Diffusion Adapter for Efficient Video Generation
SimDA: Simple Diffusion Adapter for Efficient Video GenerationComputer Vision and Pattern Recognition (CVPR), 2023
Zhen Xing
Jingdong Sun
Hang-Rui Hu
Zuxuan Wu
Yu-Gang Jiang
VGenDiffM
268
107
0
18 Aug 2023
Edit Temporal-Consistent Videos with Image Diffusion Model
Edit Temporal-Consistent Videos with Image Diffusion Model
Yuan-Zheng Wang
Yong Li
Xiaoya Zhang
Xin Liu
Anbo Dai
Antoni B. Chan
Zhen Cui
DiffM
260
12
0
17 Aug 2023
Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual
  and Semantic Credit Assignment
Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment
Qi Chen
Chaorui Deng
Zixiong Huang
Bowen Zhang
Zhuliang Yu
Qi Wu
EGVM
205
0
0
16 Aug 2023
Painter: Teaching Auto-regressive Language Models to Draw Sketches
Painter: Teaching Auto-regressive Language Models to Draw Sketches
Reza Pourreza
Apratim Bhattacharyya
Sunny Panchal
Mingu Lee
Pulkit Madan
Roland Memisevic
174
6
0
16 Aug 2023
Jurassic World Remake: Bringing Ancient Fossils Back to Life via
  Zero-Shot Long Image-to-Image Translation
Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image TranslationACM Multimedia (ACM MM), 2023
Alexander Martin
Haitian Zheng
Jie An
Jiebo Luo
VLMDiffM
173
1
0
14 Aug 2023
MarkovGen: Structured Prediction for Efficient Text-to-Image Generation
MarkovGen: Structured Prediction for Efficient Text-to-Image GenerationComputer Vision and Pattern Recognition (CVPR), 2023
Sadeep Jayasumana
Daniel Glasner
Srikumar Ramalingam
Andreas Veit
Ayan Chakrabarti
Surinder Kumar
DiffM
284
0
0
14 Aug 2023
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image
  Diffusion Models
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
Hu Ye
Jun Zhang
Siyi Liu
Xiao Han
Wei Yang
DiffM
323
1,264
0
13 Aug 2023
White-box Membership Inference Attacks against Diffusion Models
White-box Membership Inference Attacks against Diffusion ModelsProceedings on Privacy Enhancing Technologies (PoPETs), 2023
Yan Pang
Tianhao Wang
Xu Kang
Mengdi Huai
Yang Zhang
AAMLDiffM
289
38
0
11 Aug 2023
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image
  Generation
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image GenerationACM Multimedia (ACM MM), 2023
Leigang Qu
Shengqiong Wu
Hao Fei
Liqiang Nie
Tat-Seng Chua
LM&RoDiffMMLLM
324
130
0
09 Aug 2023
Circumventing Concept Erasure Methods For Text-to-Image Generative
  Models
Circumventing Concept Erasure Methods For Text-to-Image Generative ModelsInternational Conference on Learning Representations (ICLR), 2023
Minh Pham
Kelly O. Marshall
Niv Cohen
Govind Mittal
Chinmay Hegde
DiffM
238
67
0
03 Aug 2023
Guiding Image Captioning Models Toward More Specific Captions
Guiding Image Captioning Models Toward More Specific CaptionsIEEE International Conference on Computer Vision (ICCV), 2023
Simon Kornblith
Lala Li
Zirui Wang
Thao Nguyen
320
21
0
31 Jul 2023
Visual Instruction Inversion: Image Editing via Visual Prompting
Visual Instruction Inversion: Image Editing via Visual Prompting
Thao Nguyen
Yuheng Li
Utkarsh Ojha
Yong Jae Lee
DiffM
142
31
0
26 Jul 2023
TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition
TF-ICON: Diffusion-Based Training-Free Cross-Domain Image CompositionIEEE International Conference on Computer Vision (ICCV), 2023
Shilin Lu
Yanzhu Liu
A. Kong
622
185
0
24 Jul 2023
Divide & Bind Your Attention for Improved Generative Semantic Nursing
Divide & Bind Your Attention for Improved Generative Semantic NursingBritish Machine Vision Conference (BMVC), 2023
Yumeng Li
Margret Keuper
Dan Zhang
Anna Khoreva
DiffM
339
77
0
20 Jul 2023
Text2Layer: Layered Image Generation using Latent Diffusion Model
Text2Layer: Layered Image Generation using Latent Diffusion Model
Xinyang Zhang
Wentian Zhao
Xin Lu
J. Chien
DiffM
193
27
0
19 Jul 2023
Beyond the ML Model: Applying Safety Engineering Frameworks to
  Text-to-Image Development
Beyond the ML Model: Applying Safety Engineering Frameworks to Text-to-Image DevelopmentAAAI/ACM Conference on AI, Ethics, and Society (AIES), 2023
Shalaleh Rismani
Renee Shelby
A. Smart
Renelito Delos Santos
AJung Moon
Negar Rostamzadeh
228
13
0
19 Jul 2023
Complexity Matters: Rethinking the Latent Space for Generative Modeling
Complexity Matters: Rethinking the Latent Space for Generative ModelingNeural Information Processing Systems (NeurIPS), 2023
Tianyang Hu
Fei Chen
Hong Wang
Jiawei Li
Wei Cao
Jiacheng Sun
Hao Sun
DiffM
320
17
0
17 Jul 2023
Zero-Shot Image Harmonization with Generative Model Prior
Zero-Shot Image Harmonization with Generative Model PriorIEEE transactions on multimedia (IEEE TMM), 2023
Jianqi Chen
Yilan Zhang
Zhengxia Zou
Keyan Chen
Z. Shi
DiffM
302
9
0
17 Jul 2023
Can Pre-Trained Text-to-Image Models Generate Visual Goals for
  Reinforcement Learning?
Can Pre-Trained Text-to-Image Models Generate Visual Goals for Reinforcement Learning?Neural Information Processing Systems (NeurIPS), 2023
Jialu Gao
Kaizhe Hu
Guowei Xu
Huazhe Xu
LM&Ro
197
21
0
15 Jul 2023
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image
  Models
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image ModelsComputer Vision and Pattern Recognition (CVPR), 2023
Nataniel Ruiz
Yuanzhen Li
Varun Jampani
Wei Wei
Tingbo Hou
Yael Pritch
Neal Wadhwa
Michael Rubinstein
Kfir Aberman
DiffM
212
228
0
13 Jul 2023
Emu: Generative Pretraining in Multimodality
Emu: Generative Pretraining in MultimodalityInternational Conference on Learning Representations (ICLR), 2023
Quan-Sen Sun
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Yueze Wang
Hongcheng Gao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
359
155
0
11 Jul 2023
Diffusion idea exploration for art generation
Diffusion idea exploration for art generation
N. Verma
DiffM
225
1
0
11 Jul 2023
Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image
  Alignment with Iterative VQA Feedback
Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA FeedbackNeural Information Processing Systems (NeurIPS), 2023
Jaskirat Singh
Liang Zheng
299
37
0
10 Jul 2023
DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer
DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer
Dan Ruta
Gemma Canet Tarrés
Andrew Gilbert
Eli Shechtman
Nicholas I. Kolkin
John Collomosse
DiffM
348
8
0
09 Jul 2023
Text-Guided Synthesis of Eulerian Cinemagraphs
Text-Guided Synthesis of Eulerian CinemagraphsACM Transactions on Graphics (TOG), 2023
Aniruddha Mahapatra
Aliaksandr Siarohin
Hsin-Ying Lee
Sergey Tulyakov
Sitong Su
DiffMVGen
206
24
0
06 Jul 2023
SDXL: Improving Latent Diffusion Models for High-Resolution Image
  Synthesis
SDXL: Improving Latent Diffusion Models for High-Resolution Image SynthesisInternational Conference on Learning Representations (ICLR), 2023
Dustin Podell
Zion English
Kyle Lacey
A. Blattmann
Tim Dockhorn
Jonas Muller
Joe Penna
Robin Rombach
1.7K
3,833
0
04 Jul 2023
Previous
123...131415...192021
Next
Page 14 of 21
Pageof 21