Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2206.10789
Cited By

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

22 June 2022

Vijay Vasudevan

Burcu Karagol Ayan

Jason Baldridge

ArXiv (abs)PDF HTML HuggingFace (4 upvotes)

Papers citing "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"

50 / 1,010 papers shown

MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training

MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training

Takashi Shibuya

Shusuke Takahashi

Shusuke Takahashi

366

6

0

04 Jun 2024

$Δ$-DiT: A Training-Free Acceleration Method Tailored for Diffusion
Transformers

Δ

-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers

Peng Ye

Tao Chen

406

91

0

03 Jun 2024

Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation

Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation

Mingyuan Zhou

Huangjie Zheng

276

2

0

03 Jun 2024

Kaleido Diffusion: Improving Conditional Diffusion Models with
Autoregressive Latent Modeling

Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling

283

14

0

31 May 2024

Text Guided Image Editing with Automatic Concept Locating and Forgetting

Text Guided Image Editing with Automatic Concept Locating and Forgetting

Lijie Hu

245

13

0

30 May 2024

Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities

Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities

Melissa Ferrari

212

1

0

29 May 2024

Why are Visually-Grounded Language Models Bad at Image Classification?

Why are Visually-Grounded Language Models Bad at Image Classification?

Serena Yeung-Levy

274

67

0

28 May 2024

Training-free Editioning of Text-to-Image Models

Training-free Editioning of Text-to-Image Models

209

0

0

27 May 2024

EM Distillation for One-step Diffusion Models

EM Distillation for One-step Diffusion Models

Diederik P. Kingma

Kevin Patrick Murphy

Ruiqi Gao

321

47

0

27 May 2024

TIE: Revolutionizing Text-based Image Editing for Complex-Prompt
Following and High-Fidelity Editing

TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing

223

2

0

27 May 2024

Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

C. N. Vasconcelos

Abdullah Rashwan Austin Waters

...

David J. Fleet

Jason Baldridge

208

4

0

27 May 2024

Glauber Generative Model: Discrete Diffusion Models via Binary Classification

Glauber Generative Model: Discrete Diffusion Models via Binary Classification

Dheeraj M. Nagaraj

Karthikeyan Shanmugam

485

7

0

27 May 2024

Ensembling Diffusion Models via Adaptive Feature Aggregation

Ensembling Diffusion Models via Adaptive Feature Aggregation

Zhiwei Jiang

347

15

0

27 May 2024

Towards Black-Box Membership Inference Attack for Diffusion Models

Towards Black-Box Membership Inference Attack for Diffusion Models

451

8

0

25 May 2024

Towards Understanding the Working Mechanism of Text-to-Image Diffusion
Model

Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model

373

32

0

24 May 2024

Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion

Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion

227

1

0

24 May 2024

DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception

DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception

Run Luo

Yunshui Li

Ting-En Lin

...

Xiaobo Xia

Min Yang

453

34

0

24 May 2024

Improved Distribution Matching Distillation for Fast Image Synthesis

Improved Distribution Matching Distillation for Fast Image Synthesis

William T. Freeman

435

288

0

23 May 2024

EditWorld: Simulating World Dynamics for Instruction-Following Image
Editing

EditWorld: Simulating World Dynamics for Instruction-Following Image Editing

Jiaming Liu

218

30

0

23 May 2024

Learning Multi-dimensional Human Preference for Text-to-Image Generation

Learning Multi-dimensional Human Preference for Text-to-Image GenerationComputer Vision and Pattern Recognition (CVPR), 2024

278

76

0

23 May 2024

RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

RectifID: Personalizing Rectified Flow with Anchored Classifier GuidanceNeural Information Processing Systems (NeurIPS), 2024

Kun Xu

...

255

10

0

23 May 2024

Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models

Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion ModelsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024

452

29

0

23 May 2024

Robust Disaster Assessment from Aerial Imagery Using Text-to-Image
Synthetic Data

Robust Disaster Assessment from Aerial Imagery Using Text-to-Image Synthetic Data

Jihyeon Janel Lee

Manmohan Chandraker

287

2

0

22 May 2024

A Versatile Diffusion Transformer with Mixture of Noise Levels for
Audiovisual Generation

A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation

Alonso Martinez

...

Krishna Somandepalli

203

17

0

22 May 2024

How to Trace Latent Generative Model Generated Images without Artificial
Watermark?

How to Trace Latent Generative Model Generated Images without Artificial Watermark?

Dimitris N. Metaxas

273

18

0

22 May 2024

Computational Tradeoffs in Image Synthesis: Diffusion, Masked-Token, and
Next-Token Prediction

Computational Tradeoffs in Image Synthesis: Diffusion, Masked-Token, and Next-Token Prediction

Luke Zettlemoyer

360

12

0

21 May 2024

OpenCarbonEval: A Unified Carbon Emission Estimation Framework in
Large-Scale AI Models

OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models

212

6

0

21 May 2024

UPAM: Unified Prompt Attack in Text-to-Image Generation Models Against
Both Textual Filters and Visual Checkers

UPAM: Unified Prompt Attack in Text-to-Image Generation Models Against Both Textual Filters and Visual CheckersInternational Conference on Machine Learning (ICML), 2024

Jun Liu

321

8

0

18 May 2024

AquaLoRA: Toward White-box Protection for Customized Stable Diffusion
Models via Watermark LoRA

AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRAInternational Conference on Machine Learning (ICML), 2024

387

45

0

18 May 2024

Compositional Text-to-Image Generation with Dense Blob Representations

Compositional Text-to-Image Generation with Dense Blob RepresentationsInternational Conference on Machine Learning (ICML), 2024

Morteza Mardani

Benjamin Eckart

299

36

0

14 May 2024

Training-free Subject-Enhanced Attention Guidance for Compositional
Text-to-image Generation

Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation

222

3

0

11 May 2024

Distilling Diffusion Models into Conditional GANs

Distilling Diffusion Models into Conditional GANsEuropean Conference on Computer Vision (ECCV), 2024

Connelly Barnes

Jun-Yan Zhu

784

75

0

09 May 2024

FlexEControl: Flexible and Efficient Multimodal Control for
Text-to-Image Generation

FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation

Jacob Zhiyuan Fang

Robinson Piramuthu

Mohit Bansal

Vicente Ordonez

Gunnar Sigurdsson

229

4

0

08 May 2024

Generated Contents Enrichment

Generated Contents Enrichment

341

0

0

06 May 2024

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

...

362

82

0

06 May 2024

Auto-Encoding Morph-Tokens for Multimodal LLM

Auto-Encoding Morph-Tokens for Multimodal LLMInternational Conference on Machine Learning (ICML), 2024

253

32

0

03 May 2024

Customizing Text-to-Image Models with a Single Image Pair

Customizing Text-to-Image Models with a Single Image PairACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia), 2024

316

37

0

02 May 2024

Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance

Improving Subject-Driven Image Synthesis with Subject-Agnostic GuidanceComputer Vision and Pattern Recognition (CVPR), 2024

Kelvin C. K. Chan

Ming-Hsuan Yang

225

4

0

02 May 2024

DOCCI: Descriptions of Connected and Contrasting Images

DOCCI: Descriptions of Connected and Contrasting Images

...

Jordi Pont-Tuset

Jason Baldridge

274

96

0

30 Apr 2024

Stylus: Automatic Adapter Selection for Diffusion Models

Stylus: Automatic Adapter Selection for Diffusion Models

Brandon Trabucco

Joseph E. Gonzalez

Ruslan Salakhutdinov

224

17

0

29 Apr 2024

SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse
Attributes

SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse Attributes

Georgia Baltsou

Ioannis Sarridis

Symeon Papadopoulos

286

8

0

26 Apr 2024

Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings

Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings

Isabela Albuquerque

...

Jordi Pont-Tuset

Aida Nematzadeh

Anant Nawalgaria

Jordi Pont-Tuset

Aida Nematzadeh

986

32

0

25 Apr 2024

From Parts to Whole: A Unified Reference Framework for Controllable
Human Image Generation

From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation

234

20

0

23 Apr 2024

Do not think pink elephant!

Do not think pink elephant!

Nojun Kwak

69

2

0

22 Apr 2024

Accelerating Image Generation with Sub-path Linear Approximation Model

Accelerating Image Generation with Sub-path Linear Approximation Model

275

15

0

22 Apr 2024

Iteratively Prompting Multimodal LLMs to Reproduce Natural and
AI-Generated Images

Iteratively Prompting Multimodal LLMs to Reproduce Natural and AI-Generated Images

Amir Houmansadr

260

12

0

21 Apr 2024

GenVideo: One-shot Target-image and Shape Aware Video Editing using T2I
Diffusion Models

GenVideo: One-shot Target-image and Shape Aware Video Editing using T2I Diffusion Models

Sai Sree Harsha

Ambareesh Revanur

Dhwanit Agarwal

Shradha Agrawal

146

6

0

18 Apr 2024

Customizing Text-to-Image Diffusion with Camera Viewpoint Control

Customizing Text-to-Image Diffusion with Camera Viewpoint Control

Jun-Yan Zhu

268

5

0

18 Apr 2024

COCONut: Modernizing COCO Segmentation

COCONut: Modernizing COCO Segmentation

Peng Wang

Liang-Chieh Chen

206

21

0

12 Apr 2024

Implicit and Explicit Language Guidance for Diffusion-based Visual
Perception

Implicit and Explicit Language Guidance for Diffusion-based Visual Perception

Jiale Cao

Jin Xie

262

2

0

11 Apr 2024

1 2 3...8 9 10...19 20 21