ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.10789
  4. Cited By
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

22 June 2022
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
Zirui Wang
Vijay Vasudevan
Alexander Ku
Yinfei Yang
Burcu Karagol Ayan
Ben Hutchinson
Wei Han
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
    EGVM
ArXivPDFHTML

Papers citing "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"

50 / 865 papers shown
Title
Generative AI in Vision: A Survey on Models, Metrics and Applications
Generative AI in Vision: A Survey on Models, Metrics and Applications
Gaurav Raut
Apoorv Singh
VLM
MedIm
36
6
0
26 Feb 2024
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept
  Composition
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition
Chun-Hsiao Yeh
Ta-Ying Cheng
He-Yen Hsieh
Chuan-En Lin
Yi Ma
Andrew Markham
Niki Trigoni
H. T. Kung
Yubei Chen
DiffM
25
3
0
23 Feb 2024
ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion
  Models against Stochastic Perturbation
ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation
Yi Zhang
Yun Tang
Wenjie Ruan
Xiaowei Huang
Siddartha Khastgir
P. Jennings
Xingyu Zhao
AAML
24
4
0
23 Feb 2024
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video
  Synthesis
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Willi Menapace
Aliaksandr Siarohin
Ivan Skorokhodov
Ekaterina Deyneka
Tsai-Shien Chen
...
Yuwei Fang
A. Stoliar
Elisa Ricci
Jian Ren
Sergey Tulyakov
VGen
38
56
0
22 Feb 2024
Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion
  Models
Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion Models
C. Wu
Fernando De la Torre
DiffM
19
2
0
21 Feb 2024
CoFRIDA: Self-Supervised Fine-Tuning for Human-Robot Co-Painting
CoFRIDA: Self-Supervised Fine-Tuning for Human-Robot Co-Painting
Peter Schaldenbrand
Gaurav Parmar
Jun-Yan Zhu
James McCann
Jean Oh
19
13
0
21 Feb 2024
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
Huizhuo Yuan
Zixiang Chen
Kaixuan Ji
Quanquan Gu
55
24
0
15 Feb 2024
L3GO: Language Agents with Chain-of-3D-Thoughts for Generating
  Unconventional Objects
L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects
Yutaro Yamada
Khyathi Raghavi Chandu
Yuchen Lin
Jack Hessel
Ilker Yildirim
Yejin Choi
AI4CE
23
12
0
14 Feb 2024
World Model on Million-Length Video And Language With Blockwise RingAttention
World Model on Million-Length Video And Language With Blockwise RingAttention
Hao Liu
Wilson Yan
Matei A. Zaharia
Pieter Abbeel
VGen
29
57
0
13 Feb 2024
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model
  on 100K hours of data
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Mateusz Lajszczak
Guillermo Cámbara
Yang Li
Fatih Beyhan
Arent van Korlaar
...
Bartosz Putrycz
Soledad López Gambino
Kayeon Yoo
Elena Sokolova
Thomas Drugman
LM&MA
33
72
0
12 Feb 2024
Rolling Diffusion Models
Rolling Diffusion Models
David Ruhe
Jonathan Heek
Tim Salimans
Emiel Hoogeboom
DiffM
23
32
0
12 Feb 2024
Human Aesthetic Preference-Based Large Text-to-Image Model
  Personalization: Kandinsky Generation as an Example
Human Aesthetic Preference-Based Large Text-to-Image Model Personalization: Kandinsky Generation as an Example
Aven Le Zhou
Yu-Ao Wang
Wei Wu
Kang Zhang
19
1
0
09 Feb 2024
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
Dewei Zhou
You Li
Fan Ma
Zongxin Yang
Yi Yang
DiffM
20
57
0
08 Feb 2024
CapHuman: Capture Your Moments in Parallel Universes
CapHuman: Capture Your Moments in Parallel Universes
Chao Liang
Fan Ma
Linchao Zhu
Yingying Deng
Yi Yang
DiffM
21
22
0
01 Feb 2024
Spatial-Aware Latent Initialization for Controllable Image Generation
Spatial-Aware Latent Initialization for Controllable Image Generation
Wenqiang Sun
Tengtao Li
Zehong Lin
Jun Zhang
29
10
0
29 Jan 2024
BootPIG: Bootstrapping Zero-shot Personalized Image Generation
  Capabilities in Pretrained Diffusion Models
BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models
Senthil Purushwalkam
Akash Gokul
Shafiq R. Joty
Nikhil Naik
DiffM
29
16
0
25 Jan 2024
Large-scale Reinforcement Learning for Diffusion Models
Large-scale Reinforcement Learning for Diffusion Models
Yinan Zhang
Eric Tzeng
Yilun Du
Dmitry Kislyuk
VLM
26
29
0
20 Jan 2024
Make-A-Shape: a Ten-Million-scale 3D Shape Model
Make-A-Shape: a Ten-Million-scale 3D Shape Model
Ka-Hei Hui
Aditya Sanghi
Arianna Rampini
Kamal Rahimi Malekshan
Zhengzhe Liu
Hooman Shayani
Chi-Wing Fu
DiffM
21
17
0
20 Jan 2024
MM-Interleaved: Interleaved Image-Text Generative Modeling via
  Multi-modal Feature Synchronizer
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Changyao Tian
Xizhou Zhu
Yuwen Xiong
Weiyun Wang
Zhe Chen
...
Tong Lu
Jie Zhou
Hongsheng Li
Yu Qiao
Jifeng Dai
AuLLM
83
41
0
18 Jan 2024
DiffusionGPT: LLM-Driven Text-to-Image Generation System
DiffusionGPT: LLM-Driven Text-to-Image Generation System
Jie Qin
Jie Wu
Weifeng Chen
Yuxi Ren
Huixian Li
Hefeng Wu
Xuefeng Xiao
Rui Wang
S. Wen
DiffM
50
24
0
18 Jan 2024
WorldDreamer: Towards General World Models for Video Generation via
  Predicting Masked Tokens
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Xiaofeng Wang
Zheng Zhu
Guan Huang
Boyuan Wang
Xinze Chen
Jiwen Lu
VGen
27
32
0
18 Jan 2024
HexaGen3D: StableDiffusion is just one step away from Fast and Diverse
  Text-to-3D Generation
HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation
Antoine Mercier
Ramin Nakhli
Mahesh Reddy
R. Yasarla
Hong Cai
Fatih Porikli
Guillaume Berger
DiffM
22
15
0
15 Jan 2024
Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for
  Text-to-Image Generation
Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation
Seung Hyun Lee
Yinxiao Li
Junjie Ke
Innfarn Yoo
Han Zhang
...
Junfeng He
Gang Li
Sangpil Kim
Irfan Essa
Feng Yang
EGVM
27
18
0
11 Jan 2024
Concept Alignment
Concept Alignment
Sunayana Rane
Polyphony J. Bruna
Ilia Sucholutsky
Christopher Kello
Thomas L. Griffiths
CVBM
20
7
0
09 Jan 2024
Uni3D-LLM: Unifying Point Cloud Perception, Generation and Editing with
  Large Language Models
Uni3D-LLM: Unifying Point Cloud Perception, Generation and Editing with Large Language Models
Dingning Liu
Xiaoshui Huang
Yuenan Hou
Zhihui Wang
Zhen-fei Yin
Yongshun Gong
Peng Gao
Wanli Ouyang
11
8
0
09 Jan 2024
Improving Diffusion-Based Image Synthesis with Context Prediction
Improving Diffusion-Based Image Synthesis with Context Prediction
Ling Yang
Jingwei Liu
Shenda Hong
Zhilong Zhang
Zhilin Huang
Zheming Cai
Wentao Zhang
Bin Cui
DiffM
38
33
0
04 Jan 2024
Instruct-Imagen: Image Generation with Multi-modal Instruction
Instruct-Imagen: Image Generation with Multi-modal Instruction
Hexiang Hu
Kelvin C. K. Chan
Yu-Chuan Su
Wenhu Chen
Yandong Li
...
Xue Ben
Boqing Gong
William W. Cohen
Ming-Wei Chang
Xuhui Jia
MLLM
33
42
0
03 Jan 2024
Image Sculpting: Precise Object Editing with 3D Geometry Control
Image Sculpting: Precise Object Editing with 3D Geometry Control
Jiraphon Yenphraphai
Xichen Pan
Sainan Liu
Daniele Panozzo
Saining Xie
30
17
0
02 Jan 2024
Improving Image Restoration through Removing Degradations in Textual
  Representations
Improving Image Restoration through Removing Degradations in Textual Representations
Jingbo Lin
Zhilu Zhang
Yuxiang Wei
Dongwei Ren
Dongsheng Jiang
Wangmeng Zuo
21
25
0
28 Dec 2023
ZONE: Zero-Shot Instruction-Guided Local Editing
ZONE: Zero-Shot Instruction-Guided Local Editing
Shanglin Li
Bo-Wen Zeng
Yutang Feng
Sicheng Gao
Xuhui Liu
...
Li Lin
Xu Tang
Yao Hu
Jianzhuang Liu
Baochang Zhang
DiffM
18
30
0
28 Dec 2023
Semantic Guidance Tuning for Text-To-Image Diffusion Models
Hyun Kang
Dohae Lee
Myungjin Shin
In-Kwon Lee
19
1
0
26 Dec 2023
Cross Initialization for Personalized Text-to-Image Generation
Cross Initialization for Personalized Text-to-Image Generation
Lianyu Pang
Jian Yin
Haoran Xie
Qiping Wang
Qing Li
Xudong Mao
DiffM
16
7
0
26 Dec 2023
Emage: Non-Autoregressive Text-to-Image Generation
Emage: Non-Autoregressive Text-to-Image Generation
Zhangyin Feng
Runyi Hu
Liangxin Liu
Fan Zhang
Duyu Tang
Yong Dai
Xiaocheng Feng
Jiwei Li
Bing Qin
Shuming Shi
DiffM
VLM
14
0
0
22 Dec 2023
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
Alicia Golden
Samuel Hsia
Fei Sun
Bilge Acun
Basil Hosmer
...
Zachary DeVito
Jeff Johnson
Gu-Yeon Wei
David Brooks
Carole-Jean Wu
VLM
DiffM
22
8
0
22 Dec 2023
VideoPoet: A Large Language Model for Zero-Shot Video Generation
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Dan Kondratyuk
Lijun Yu
Xiuye Gu
José Lezama
Jonathan Huang
...
Irfan Essa
Huisheng Wang
David A. Ross
Bryan Seybold
Lu Jiang
VGen
15
237
0
21 Dec 2023
DreamTuner: Single Image is Enough for Subject-Driven Generation
DreamTuner: Single Image is Enough for Subject-Driven Generation
Miao Hua
Jiawei Liu
Fei Ding
Wei Liu
Jie Wu
Qian He
11
28
0
21 Dec 2023
StarVector: Generating Scalable Vector Graphics Code from Images
StarVector: Generating Scalable Vector Graphics Code from Images
Juan A. Rodriguez
Shubham Agarwal
I. Laradji
Pau Rodríguez
David Vazquez
Christopher Pal
M. Pedersoli
38
6
0
17 Dec 2023
Rich Human Feedback for Text-to-Image Generation
Rich Human Feedback for Text-to-Image Generation
Youwei Liang
Junfeng He
Gang Li
Peizhao Li
Arseniy Klimovskiy
...
Yiwen Luo
Yang Li
Kai Kohlhoff
Deepak Ramachandran
Vidhya Navalpakkam
EGVM
19
66
0
15 Dec 2023
HeadArtist: Text-conditioned 3D Head Generation with Self Score
  Distillation
HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation
Hongyu Liu
Xuan Wang
Ziyu Wan
Yujun Shen
Yibing Song
Jing Liao
Qifeng Chen
DiffM
36
17
0
12 Dec 2023
Photorealistic Video Generation with Diffusion Models
Photorealistic Video Generation with Diffusion Models
Agrim Gupta
Lijun Yu
Kihyuk Sohn
Xiuye Gu
Meera Hahn
Fei-Fei Li
Irfan Essa
Lu Jiang
José Lezama
VGen
39
174
0
11 Dec 2023
4M: Massively Multimodal Masked Modeling
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
34
62
0
11 Dec 2023
ControlNet-XS: Designing an Efficient and Effective Architecture for
  Controlling Text-to-Image Diffusion Models
ControlNet-XS: Designing an Efficient and Effective Architecture for Controlling Text-to-Image Diffusion Models
Denis Zavadski
Johann-Friedrich Feiden
Carsten Rother
DiffM
44
5
0
11 Dec 2023
Stellar: Systematic Evaluation of Human-Centric Personalized
  Text-to-Image Methods
Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods
Panos Achlioptas
Alexandros Benetatos
Iordanis Fostiropoulos
Dimitris Skourtis
18
8
0
11 Dec 2023
TabMT: Generating tabular data with masked transformers
TabMT: Generating tabular data with masked transformers
Manbir Gulati
Paul F. Roysdon
LMTD
43
32
0
11 Dec 2023
CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image
  Diffusion Models
CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image Diffusion Models
Tuna Han Salih Meral
Enis Simsar
Federico Tombari
Pinar Yanardag
DiffM
VLM
28
26
0
11 Dec 2023
Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion
  Models
Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion Models
Zhipeng Bao
Yijun Li
Krishna Kumar Singh
Yu-Xiong Wang
Martial Hebert
20
8
0
10 Dec 2023
A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization
  Inversion for Zero-Shot Video Editing
A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing
Maomao Li
Yu Li
Tianyu Yang
Yunfei Liu
Dongxu Yue
Zhihui Lin
Dong Xu
VGen
10
8
0
10 Dec 2023
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained
  Object Insertion and Layout Control
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control
Jaskirat Singh
Jianming Zhang
Qing Liu
Cameron Smith
Zhe-nan Lin
Liang Zheng
DiffM
28
11
0
08 Dec 2023
Scaling Laws of Synthetic Images for Model Training ... for Now
Scaling Laws of Synthetic Images for Model Training ... for Now
Lijie Fan
Kaifeng Chen
Dilip Krishnan
Dina Katabi
Phillip Isola
Yonglong Tian
CLIP
VLM
25
61
0
07 Dec 2023
Gen2Det: Generate to Detect
Gen2Det: Generate to Detect
Saksham Suri
Fanyi Xiao
Animesh Sinha
Sean Culatana
Raghuraman Krishnamoorthi
Chenchen Zhu
Abhinav Shrivastava
VLM
DiffM
18
9
0
07 Dec 2023
Previous
123...789...161718
Next