ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.10789
  4. Cited By
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

22 June 2022
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
Zirui Wang
Vijay Vasudevan
Alexander Ku
Yinfei Yang
Burcu Karagol Ayan
Ben Hutchinson
Wei Han
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
    EGVM
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)

Papers citing "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"

50 / 1,010 papers shown
MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
Kengo Uchida
Takashi Shibuya
Yuhta Takida
Naoki Murata
Shusuke Takahashi
Shusuke Takahashi
Yuki Mitsufuji
VGen
366
6
0
04 Jun 2024
$Δ$-DiT: A Training-Free Acceleration Method Tailored for Diffusion
  Transformers
ΔΔΔ-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers
Pengtao Chen
Mingzhu Shen
Peng Ye
Jianjian Cao
Chongjun Tu
C. Bouganis
Yiren Zhao
Tao Chen
406
91
0
03 Jun 2024
Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation
Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation
Mingyuan Zhou
Zhendong Wang
Huangjie Zheng
Hai Huang
VLMDiffM
276
2
0
03 Jun 2024
Kaleido Diffusion: Improving Conditional Diffusion Models with
  Autoregressive Latent Modeling
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling
Jiatao Gu
Ying Shen
Shuangfei Zhai
Yizhe Zhang
Navdeep Jaitly
J. Susskind
283
14
0
31 May 2024
Text Guided Image Editing with Automatic Concept Locating and Forgetting
Text Guided Image Editing with Automatic Concept Locating and Forgetting
Jia Li
Lijie Hu
Zhixian He
Jingfeng Zhang
Tianhang Zheng
Haiyan Zhao
DiffM
245
13
0
30 May 2024
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
Vicky Zayats
Peter Chen
Melissa Ferrari
Dirk Padfield
AI4CE
212
1
0
29 May 2024
Why are Visually-Grounded Language Models Bad at Image Classification?
Why are Visually-Grounded Language Models Bad at Image Classification?
Yuhui Zhang
Alyssa Unell
Xiaohan Wang
Dhruba Ghosh
Yuchang Su
Ludwig Schmidt
Serena Yeung-Levy
VLM
274
67
0
28 May 2024
Training-free Editioning of Text-to-Image Models
Training-free Editioning of Text-to-Image Models
Jinqi Wang
Yunfei Fu
Zhangcan Ding
Bailin Deng
Yu-Kun Lai
Yipeng Qin
DiffMVLM
209
0
0
27 May 2024
EM Distillation for One-step Diffusion Models
EM Distillation for One-step Diffusion Models
Sirui Xie
Zhisheng Xiao
Diederik P. Kingma
Tingbo Hou
Ying Nian Wu
Kevin Patrick Murphy
Tim Salimans
Ben Poole
Ruiqi Gao
VLMDiffM
321
47
0
27 May 2024
TIE: Revolutionizing Text-based Image Editing for Complex-Prompt
  Following and High-Fidelity Editing
TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing
Xinyu Zhang
Mengxue Kang
Fei Wei
Shuang Xu
Yuhe Liu
Lin Ma
MLLMDiffM
223
2
0
27 May 2024
Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models
Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models
C. N. Vasconcelos
Abdullah Rashwan Austin Waters
Trevor Walker
Keyang Xu
Jimmy Yan
...
Wenlei Zhou
Kevin Swersky
David J. Fleet
Jason Baldridge
Oliver Wang
208
4
0
27 May 2024
Glauber Generative Model: Discrete Diffusion Models via Binary Classification
Glauber Generative Model: Discrete Diffusion Models via Binary Classification
Harshit Varma
Dheeraj M. Nagaraj
Karthikeyan Shanmugam
VLM
485
7
0
27 May 2024
Ensembling Diffusion Models via Adaptive Feature Aggregation
Ensembling Diffusion Models via Adaptive Feature Aggregation
Cong Wang
Kuan Tian
Yonghang Guan
Jun Zhang
Zhiwei Jiang
Fei Shen
Xiao Han
347
15
0
27 May 2024
Towards Black-Box Membership Inference Attack for Diffusion Models
Towards Black-Box Membership Inference Attack for Diffusion Models
Jingwei Li
Jingyi Dong
Tianxing He
Jingzhao Zhang
451
8
0
25 May 2024
Towards Understanding the Working Mechanism of Text-to-Image Diffusion
  Model
Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model
Mingyang Yi
Aoxue Li
Yi Xin
Zhenguo Li
DiffM
373
32
0
24 May 2024
Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion
Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion
Aoxue Li
Mingyang Yi
Zhenguo Li
DiffM
227
1
0
24 May 2024
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
Run Luo
Yunshui Li
Longze Chen
Wanwei He
Ting-En Lin
...
Zikai Song
Xiaobo Xia
Tongliang Liu
Min Yang
Binyuan Hui
VLMDiffM
453
34
0
24 May 2024
Improved Distribution Matching Distillation for Fast Image Synthesis
Improved Distribution Matching Distillation for Fast Image Synthesis
Tianwei Yin
Michael Gharbi
Taesung Park
Richard Zhang
Eli Shechtman
Frédo Durand
William T. Freeman
DiffM
435
288
0
23 May 2024
EditWorld: Simulating World Dynamics for Instruction-Following Image
  Editing
EditWorld: Simulating World Dynamics for Instruction-Following Image Editing
Ling Yang
Bo-Wen Zeng
Jiaming Liu
Hong Li
Minghao Xu
Wentao Zhang
Shuicheng Yan
DiffM
218
30
0
23 May 2024
Learning Multi-dimensional Human Preference for Text-to-Image Generation
Learning Multi-dimensional Human Preference for Text-to-Image GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Sixian Zhang
Bohan Wang
Junqiang Wu
Yan Li
Yan Li
Chen Zhang
Zhongyuan Wang
EGVM
278
76
0
23 May 2024
RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance
RectifID: Personalizing Rectified Flow with Anchored Classifier GuidanceNeural Information Processing Systems (NeurIPS), 2024
Zhicheng Sun
Zhenhao Yang
Yang Jin
Haozhe Chi
Kun Xu
...
Hao Jiang
Di Zhang
Yang Song
Kun Gai
Yadong Mu
255
10
0
23 May 2024
Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models
Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion ModelsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Katherine Xu
Lingzhi Zhang
Jianbo Shi
452
29
0
23 May 2024
Robust Disaster Assessment from Aerial Imagery Using Text-to-Image
  Synthetic Data
Robust Disaster Assessment from Aerial Imagery Using Text-to-Image Synthetic Data
Tarun Kalluri
Jihyeon Janel Lee
Kihyuk Sohn
Sahil Singla
Manmohan Chandraker
Joseph Z. Xu
Jeremiah Liu
287
2
0
22 May 2024
A Versatile Diffusion Transformer with Mixture of Noise Levels for
  Audiovisual Generation
A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation
Gwanghyun Kim
Alonso Martinez
Yu-Chuan Su
Brendan Jou
José Lezama
...
Lijun Yu
Lu Jiang
A. Jansen
Jacob Walker
Krishna Somandepalli
203
17
0
22 May 2024
How to Trace Latent Generative Model Generated Images without Artificial
  Watermark?
How to Trace Latent Generative Model Generated Images without Artificial Watermark?
Zhenting Wang
Vikash Sehwag
Chen Chen
Lingjuan Lyu
Dimitris N. Metaxas
Shiqing Ma
WIGM
273
18
0
22 May 2024
Computational Tradeoffs in Image Synthesis: Diffusion, Masked-Token, and
  Next-Token Prediction
Computational Tradeoffs in Image Synthesis: Diffusion, Masked-Token, and Next-Token Prediction
Maciej Kilian
Varun Jampani
Luke Zettlemoyer
DiffM
360
12
0
21 May 2024
OpenCarbonEval: A Unified Carbon Emission Estimation Framework in
  Large-Scale AI Models
OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models
Zhaojian Yu
Yinghao Wu
Zhuotao Deng
Yansong Tang
Jinqiang Cui
212
6
0
21 May 2024
UPAM: Unified Prompt Attack in Text-to-Image Generation Models Against
  Both Textual Filters and Visual Checkers
UPAM: Unified Prompt Attack in Text-to-Image Generation Models Against Both Textual Filters and Visual CheckersInternational Conference on Machine Learning (ICML), 2024
Duo Peng
Qi Ke
Jun Liu
321
8
0
18 May 2024
AquaLoRA: Toward White-box Protection for Customized Stable Diffusion
  Models via Watermark LoRA
AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRAInternational Conference on Machine Learning (ICML), 2024
Weitao Feng
Wenbo Zhou
Jiyan He
Jie Zhang
Tianyi Wei
Guanlin Li
Tianwei Zhang
Weiming Zhang
Neng H. Yu
387
45
0
18 May 2024
Compositional Text-to-Image Generation with Dense Blob Representations
Compositional Text-to-Image Generation with Dense Blob RepresentationsInternational Conference on Machine Learning (ICML), 2024
Weili Nie
Sifei Liu
Morteza Mardani
Chao Liu
Benjamin Eckart
Arash Vahdat
DiffM
299
36
0
14 May 2024
Training-free Subject-Enhanced Attention Guidance for Compositional
  Text-to-image Generation
Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation
Shengyuan Liu
Bo Wang
Ye Ma
Te Yang
Xipeng Cao
Quan Chen
Han Li
Di Dong
Peng Jiang
EGVM
222
3
0
11 May 2024
Distilling Diffusion Models into Conditional GANs
Distilling Diffusion Models into Conditional GANsEuropean Conference on Computer Vision (ECCV), 2024
Minguk Kang
Richard Zhang
Connelly Barnes
Sylvain Paris
Suha Kwak
Jaesik Park
Eli Shechtman
Jun-Yan Zhu
Taesung Park
784
75
0
09 May 2024
FlexEControl: Flexible and Efficient Multimodal Control for
  Text-to-Image Generation
FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
Xuehai He
Jian Zheng
Jacob Zhiyuan Fang
Robinson Piramuthu
Mohit Bansal
Vicente Ordonez
Gunnar Sigurdsson
Nanyun Peng
Xin Eric Wang
DiffM
229
4
0
08 May 2024
Generated Contents Enrichment
Generated Contents Enrichment
Mahdi Naseri
Jiayan Qiu
Zhou Wang
341
0
0
06 May 2024
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Zheng Zhu
Xiaofeng Wang
Wangbo Zhao
Chen Min
Nianchen Deng
...
Dawei Zhao
Liang Xiao
Jian-jun Zhao
Jiwen Lu
Guan Huang
VGenLM&Ro
362
82
0
06 May 2024
Auto-Encoding Morph-Tokens for Multimodal LLM
Auto-Encoding Morph-Tokens for Multimodal LLMInternational Conference on Machine Learning (ICML), 2024
Kaihang Pan
Siliang Tang
Juncheng Li
Zhaoyu Fan
Wei Chow
Shuicheng Yan
Tat-Seng Chua
Yueting Zhuang
Hanwang Zhang
MLLM
253
32
0
03 May 2024
Customizing Text-to-Image Models with a Single Image Pair
Customizing Text-to-Image Models with a Single Image PairACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia), 2024
Maxwell Jones
Sheng-Yu Wang
Nupur Kumari
David Bau
Jun-Yan Zhu
DiffM
316
37
0
02 May 2024
Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance
Improving Subject-Driven Image Synthesis with Subject-Agnostic GuidanceComputer Vision and Pattern Recognition (CVPR), 2024
Kelvin C. K. Chan
Yang Zhao
Xuhui Jia
Ming-Hsuan Yang
Huisheng Wang
225
4
0
02 May 2024
DOCCI: Descriptions of Connected and Contrasting Images
DOCCI: Descriptions of Connected and Contrasting Images
Yasumasa Onoe
Sunayana Rane
Zachary Berger
Yonatan Bitton
Jaemin Cho
...
Zarana Parekh
Jordi Pont-Tuset
Garrett Tanzer
Su Wang
Jason Baldridge
274
96
0
30 Apr 2024
Stylus: Automatic Adapter Selection for Diffusion Models
Stylus: Automatic Adapter Selection for Diffusion Models
Michael Luo
Justin Wong
Brandon Trabucco
Yanping Huang
Joseph E. Gonzalez
Zhifeng Chen
Ruslan Salakhutdinov
Ion Stoica
DiffM
224
17
0
29 Apr 2024
SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse
  Attributes
SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse Attributes
Georgia Baltsou
Ioannis Sarridis
C. Koutlis
Symeon Papadopoulos
286
8
0
26 Apr 2024
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Olivia Wiles
Chuhan Zhang
Isabela Albuquerque
Ivana Kajić
Su Wang
...
Jordi Pont-Tuset
Aida Nematzadeh
Anant Nawalgaria
Jordi Pont-Tuset
Aida Nematzadeh
EGVM
986
32
0
25 Apr 2024
From Parts to Whole: A Unified Reference Framework for Controllable
  Human Image Generation
From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation
Zehuan Huang
Hongxing Fan
Lipeng Wang
Lu Sheng
DiffM
234
20
0
23 Apr 2024
Do not think pink elephant!
Do not think pink elephant!
Kyomin Hwang
Suyoung Kim
Junhoo Lee
Nojun Kwak
69
2
0
22 Apr 2024
Accelerating Image Generation with Sub-path Linear Approximation Model
Accelerating Image Generation with Sub-path Linear Approximation Model
Chen Xu
Tian-Shu Song
Weixin Feng
Xubin Li
Bangyu Xiang
Bo Zheng
Limin Wang
275
15
0
22 Apr 2024
Iteratively Prompting Multimodal LLMs to Reproduce Natural and
  AI-Generated Images
Iteratively Prompting Multimodal LLMs to Reproduce Natural and AI-Generated Images
Ali Naseh
Katherine Thai
Mohit Iyyer
Amir Houmansadr
260
12
0
21 Apr 2024
GenVideo: One-shot Target-image and Shape Aware Video Editing using T2I
  Diffusion Models
GenVideo: One-shot Target-image and Shape Aware Video Editing using T2I Diffusion Models
Sai Sree Harsha
Ambareesh Revanur
Dhwanit Agarwal
Shradha Agrawal
VGenDiffM
146
6
0
18 Apr 2024
Customizing Text-to-Image Diffusion with Camera Viewpoint Control
Customizing Text-to-Image Diffusion with Camera Viewpoint Control
Nupur Kumari
Grace Su
Richard Zhang
Taesung Park
Eli Shechtman
Jun-Yan Zhu
DiffM
268
5
0
18 Apr 2024
COCONut: Modernizing COCO Segmentation
COCONut: Modernizing COCO Segmentation
XueQing Deng
Qihang Yu
Peng Wang
Xiaohui Shen
Liang-Chieh Chen
206
21
0
12 Apr 2024
Implicit and Explicit Language Guidance for Diffusion-based Visual
  Perception
Implicit and Explicit Language Guidance for Diffusion-based Visual Perception
Hefeng Wang
Jiale Cao
Jin Xie
Aiping Yang
Yanwei Pang
VLMDiffM
262
2
0
11 Apr 2024
Previous
123...8910...192021
Next