ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.10789
  4. Cited By
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

22 June 2022
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
Zirui Wang
Vijay Vasudevan
Alexander Ku
Yinfei Yang
Burcu Karagol Ayan
Ben Hutchinson
Wei Han
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
    EGVM
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)

Papers citing "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"

50 / 1,010 papers shown
A Neural Space-Time Representation for Text-to-Image Personalization
A Neural Space-Time Representation for Text-to-Image PersonalizationACM Transactions on Graphics (TOG), 2023
Yuval Alaluf
Elad Richardson
G. Metzer
Daniel Cohen-Or
DiffM
330
128
0
24 May 2023
Visual Programming for Text-to-Image Generation and Evaluation
Visual Programming for Text-to-Image Generation and Evaluation
Jaemin Cho
Abhaysinh Zala
Joey Tianyi Zhou
MLLM
388
54
0
24 May 2023
I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create
  Visual Metaphors
I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual MetaphorsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Tuhin Chakrabarty
Arkadiy Saakyan
Olivia Winn
Artemis Panagopoulou
Yue Yang
Marianna Apidianaki
Smaranda Muresan
DiffM
204
62
0
24 May 2023
Vision + Language Applications: A Survey
Vision + Language Applications: A Survey
Yutong Zhou
N. Shimada
VLM
277
13
0
24 May 2023
Diffusion Hyperfeatures: Searching Through Time and Space for Semantic
  Correspondence
Diffusion Hyperfeatures: Searching Through Time and Space for Semantic CorrespondenceNeural Information Processing Systems (NeurIPS), 2023
Grace Luo
Lisa Dunlap
Dong Huk Park
Aleksander Holynski
Trevor Darrell
421
197
0
23 May 2023
Training Transitive and Commutative Multimodal Transformers with LoReTTa
Training Transitive and Commutative Multimodal Transformers with LoReTTaNeural Information Processing Systems (NeurIPS), 2023
Manuel Tran
Yashin Dicente Cid
Amal Lahiani
Fabian J. Theis
Tingying Peng
Eldad Klaiman
321
3
0
23 May 2023
Training Priors Predict Text-To-Image Model Performance
Training Priors Predict Text-To-Image Model Performance
Charles Lovering
Ellie Pavlick
CoGe
214
4
0
23 May 2023
Enhancing Detail Preservation for Customized Text-to-Image Generation: A
  Regularization-Free Approach
Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach
Jiuxiang Gu
Ruiyi Zhang
Tongfei Sun
Jinhui Xu
DiffM
259
49
0
23 May 2023
If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based
  Text-to-Image Generation by Selection
If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection
Shyamgopal Karthik
Karsten Roth
Goran Frehse
Zeynep Akata
233
33
0
22 May 2023
ControlVideo: Training-free Controllable Text-to-Video Generation
ControlVideo: Training-free Controllable Text-to-Video GenerationInternational Conference on Learning Representations (ICLR), 2023
Yabo Zhang
Yuxiang Wei
Dongsheng Jiang
Xiaopeng Zhang
W. Zuo
Qi Tian
VGenDiffM
284
330
0
22 May 2023
Textually Pretrained Speech Language Models
Textually Pretrained Speech Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Michael Hassid
Tal Remez
Tu Nguyen
Itai Gat
Alexis Conneau
...
Alexandre Défossez
Gabriel Synnaeve
Emmanuel Dupoux
Roy Schwartz
Yossi Adi
VLMSyDa
408
93
0
22 May 2023
The Waymo Open Sim Agents Challenge
The Waymo Open Sim Agents ChallengeNeural Information Processing Systems (NeurIPS), 2023
Nico Montali
John Lambert
Paul Mougin
Alex Kuefler
Nick Rhinehart
...
Tristan Emrich
Zoey Yang
Shimon Whiteson
Brandyn White
Drago Anguelov
LLMAG
453
90
0
19 May 2023
AI's Regimes of Representation: A Community-centered Study of
  Text-to-Image Models in South Asia
AI's Regimes of Representation: A Community-centered Study of Text-to-Image Models in South AsiaConference on Fairness, Accountability and Transparency (FAccT), 2023
Rida Qadri
Renee Shelby
Cynthia L. Bennett
Emily Denton
266
95
0
19 May 2023
Towards Accurate Image Coding: Improved Autoregressive Image Generation
  with Dynamic Vector Quantization
Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector QuantizationComputer Vision and Pattern Recognition (CVPR), 2023
Mengqi Huang
Zhendong Mao
Zhuowei Chen
Yongdong Zhang
MQ
272
58
0
19 May 2023
Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with
  Images as Pivots
Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots
Jinyi Hu
Xu Han
Xiaoyuan Yi
Yutong Chen
Wenhao Li
Zhiyuan Liu
Maosong Sun
DiffM
77
4
0
19 May 2023
A Survey of Safety and Trustworthiness of Large Language Models through
  the Lens of Verification and Validation
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and ValidationArtificial Intelligence Review (AIR), 2023
Xiaowei Huang
Wenjie Ruan
Wei Huang
Gao Jin
Yizhen Dong
...
Sihao Wu
Peipei Xu
Dengyu Wu
André Freitas
Mustafa A. Mustafa
ALM
352
146
0
19 May 2023
Inspecting the Geographical Representativeness of Images from
  Text-to-Image Models
Inspecting the Geographical Representativeness of Images from Text-to-Image ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Aparna Basu
R. Venkatesh Babu
Danish Pruthi
DiffM
308
48
0
18 May 2023
X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation
  with Visual Large Language Models
X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models
Yixiong Chen
Li Liu
C. Ding
174
29
0
18 May 2023
What You See is What You Read? Improving Text-Image Alignment Evaluation
What You See is What You Read? Improving Text-Image Alignment EvaluationNeural Information Processing Systems (NeurIPS), 2023
Michal Yarom
Yonatan Bitton
Soravit Changpinyo
Roee Aharoni
Jonathan Herzig
Oran Lang
E. Ofek
Idan Szpektor
EGVM
568
116
0
17 May 2023
Sequence-to-Sequence Pre-training with Unified Modality Masking for
  Visual Document Understanding
Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding
ShuWei Feng
Tianyang Zhan
Zhanming Jie
Trung Quoc Luong
Xiaoran Jin
117
3
0
16 May 2023
DATED: Guidelines for Creating Synthetic Datasets for Engineering Design
  Applications
DATED: Guidelines for Creating Synthetic Datasets for Engineering Design ApplicationsDesign Automation Conference (DAC), 2023
Cyril Picard
Jürg Schiffmann
Faez Ahmed
167
14
0
15 May 2023
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
Yuyang Zhao
Enze Xie
Lanqing Hong
Zhenguo Li
G. Lee
DiffMVGen
196
41
0
15 May 2023
MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal
  Conditional Image Synthesis
MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image SynthesisInternational Journal of Computer Vision (IJCV), 2023
Jinsheng Zheng
Daqing Liu
Chaoyue Wang
Minghui Hu
Zuopeng Yang
Changxing Ding
Dacheng Tao
149
5
0
10 May 2023
Recommender Systems with Generative Retrieval
Recommender Systems with Generative RetrievalNeural Information Processing Systems (NeurIPS), 2023
Shashank Rajput
Nikhil Mehta
Anima Singh
Raghunandan H. Keshavan
T. Vu
...
Vinh Q. Tran
Jonah Samost
Maciej Kula
Ed H. Chi
M. Sathiamoorthy
RALM3DV
358
180
0
08 May 2023
ReGeneration Learning of Diffusion Models with Rich Prompts for
  Zero-Shot Image Translation
ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation
Yupei Lin
Senyang Zhang
Xiaojun Yang
Tianlin Li
Yukai Shi
DiffM
138
7
0
08 May 2023
Towards Prompt-robust Face Privacy Protection via Adversarial Decoupling
  Augmentation Framework
Towards Prompt-robust Face Privacy Protection via Adversarial Decoupling Augmentation Framework
Ruijia Wu
Yuhang Wang
Huafeng Shi
Zhipeng Yu
Yichao Wu
Ding Liang
DiffM
183
11
0
06 May 2023
Controllable Visual-Tactile Synthesis
Controllable Visual-Tactile SynthesisIEEE International Conference on Computer Vision (ICCV), 2023
Ruihan Gao
Wenzhen Yuan
Jun-Yan Zhu
DiffM
204
8
0
04 May 2023
Shap-E: Generating Conditional 3D Implicit Functions
Shap-E: Generating Conditional 3D Implicit Functions
Heewoo Jun
Alex Nichol
DiffM
625
414
0
03 May 2023
Nonparametric Generative Modeling with Conditional Sliced-Wasserstein
  Flows
Nonparametric Generative Modeling with Conditional Sliced-Wasserstein FlowsInternational Conference on Machine Learning (ICML), 2023
Chao Du
Tianbo Li
Tianyu Pang
Shuicheng Yan
Min Lin
DiffMBDL
323
14
0
03 May 2023
DreamPaint: Few-Shot Inpainting of E-Commerce Items for Virtual Try-On
  without 3D Modeling
DreamPaint: Few-Shot Inpainting of E-Commerce Items for Virtual Try-On without 3D Modeling
M. S. Seyfioglu
Karim Bouyarmane
Suren Kumar
A. Tavanaei
Ismail B. Tutar
DiffM
183
8
0
02 May 2023
Let the Chart Spark: Embedding Semantic Context into Chart with
  Text-to-Image Generative Model
Let the Chart Spark: Embedding Semantic Context into Chart with Text-to-Image Generative ModelIEEE Transactions on Visualization and Computer Graphics (TVCG), 2023
Shishi Xiao
Suizi Huang
Yue Lin
Yilin Ye
Weizhen Zeng
346
47
0
28 Apr 2023
IconShop: Text-Guided Vector Icon Synthesis with Autoregressive
  Transformers
IconShop: Text-Guided Vector Icon Synthesis with Autoregressive TransformersACM Transactions on Graphics (TOG), 2023
Rong Wu
Wanchao Su
Kede Ma
Jing Liao
490
62
0
27 Apr 2023
Energy-based Models are Zero-Shot Planners for Compositional Scene
  Rearrangement
Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement
N. Gkanatsios
Ayush Jain
Zhou Xian
Yunchu Zhang
C. Atkeson
Katerina Fragkiadaki
LM&Ro
421
43
0
27 Apr 2023
TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional
  Generation
TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional GenerationInternational Conference on Machine Learning (ICML), 2023
Zhaoyan Liu
Noël Vouitsis
S. Gorti
Jimmy Ba
Gabriel Loaiza-Ganem
ViT
292
2
0
26 Apr 2023
Seeing is not always believing: Benchmarking Human and Model Perception
  of AI-Generated Images
Seeing is not always believing: Benchmarking Human and Model Perception of AI-Generated ImagesNeural Information Processing Systems (NeurIPS), 2023
Zeyu Lu
Di Huang
Mengwei He
Jingjing Qu
Chengzhi Wu
Xihui Liu
Wanli Ouyang
259
94
0
25 Apr 2023
TextMesh: Generation of Realistic 3D Meshes From Text Prompts
TextMesh: Generation of Realistic 3D Meshes From Text PromptsInternational Conference on 3D Vision (3DV), 2023
Christina Tsalicoglou
Fabian Manhardt
A. Tonioni
Michael Niemeyer
F. Tombari
DiffM
192
162
0
24 Apr 2023
A Cookbook of Self-Supervised Learning
A Cookbook of Self-Supervised Learning
Randall Balestriero
Mark Ibrahim
Vlad Sobal
Ari S. Morcos
Shashank Shekhar
...
Pierre Fernandez
Amir Bar
Hamed Pirsiavash
Yann LeCun
Micah Goldblum
SyDaFedMLSSL
439
362
0
24 Apr 2023
Evolving Three Dimension (3D) Abstract Art: Fitting Concepts by Language
Evolving Three Dimension (3D) Abstract Art: Fitting Concepts by Language
Yingtao Tian
134
1
0
24 Apr 2023
Align your Latents: High-Resolution Video Synthesis with Latent
  Diffusion Models
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2023
A. Blattmann
Robin Rombach
Huan Ling
Tim Dockhorn
Seung Wook Kim
Sanja Fidler
Karsten Kreis
3DGSVGen
610
1,435
0
18 Apr 2023
Visual Instruction Tuning
Visual Instruction TuningNeural Information Processing Systems (NeurIPS), 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDaVLMMLLM
1.1K
7,496
0
17 Apr 2023
Latent-Shift: Latent Diffusion with Temporal Shift for Efficient
  Text-to-Video Generation
Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation
Jie An
Songyang Zhang
Harry Yang
Sonal Gupta
Jia-Bin Huang
Jiebo Luo
Xiaoyue Yin
DiffMVGen
290
135
0
17 Apr 2023
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image
  Synthesis and Editing
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and EditingIEEE International Conference on Computer Vision (ICCV), 2023
Ming Cao
Xintao Wang
Chen Ma
Ying Shan
Xiaohu Qie
Yinqiang Zheng
DiffM
232
673
0
17 Apr 2023
AutoSplice: A Text-prompt Manipulated Image Dataset for Media Forensics
AutoSplice: A Text-prompt Manipulated Image Dataset for Media Forensics
Shan Jia
Mingzhen Huang
Zhou Zhou
Yan Ju
Jialing Cai
Siwei Lyu
DiffM
264
47
0
14 Apr 2023
Expressive Text-to-Image Generation with Rich Text
Expressive Text-to-Image Generation with Rich TextIEEE International Conference on Computer Vision (ICCV), 2023
Songwei Ge
Taesung Park
Jun-Yan Zhu
Jia-Bin Huang
DiffM
482
97
0
13 Apr 2023
Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image
  Generation
Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation
Jaemin Cho
Linjie Li
Zhengyuan Yang
Zhe Gan
Lijuan Wang
Joey Tianyi Zhou
EGVM
196
9
0
13 Apr 2023
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image
  Generation
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image GenerationNeural Information Processing Systems (NeurIPS), 2023
Jiazheng Xu
Xiao Liu
Yuchen Wu
Yuxuan Tong
Qinkai Li
Ming Ding
Jie Tang
Yuxiao Dong
559
736
0
12 Apr 2023
Gradient-Free Textual Inversion
Gradient-Free Textual InversionACM Multimedia (ACM MM), 2023
Zhengcong Fei
Mingyuan Fan
Junshi Huang
DiffM
260
38
0
12 Apr 2023
Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into
  3D, alleviate Janus problem and Beyond
Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond
Mohammadreza Armandpour
A. Sadeghian
Huangjie Zheng
Amir Sadeghian
Mingyuan Zhou
DiffM
405
148
0
11 Apr 2023
InstantBooth: Personalized Text-to-Image Generation without Test-Time
  Finetuning
InstantBooth: Personalized Text-to-Image Generation without Test-Time FinetuningComputer Vision and Pattern Recognition (CVPR), 2023
Jing Shi
Wei Xiong
Zhe Lin
H. J. Jung
DiffM
367
366
0
06 Apr 2023
Training-Free Layout Control with Cross-Attention Guidance
Training-Free Layout Control with Cross-Attention GuidanceIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Minghao Chen
Iro Laina
Andrea Vedaldi
DiffM
440
313
0
06 Apr 2023
Previous
123...151617...192021
Next