ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.10789
  4. Cited By
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

22 June 2022
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
Zirui Wang
Vijay Vasudevan
Alexander Ku
Yinfei Yang
Burcu Karagol Ayan
Ben Hutchinson
Wei Han
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
    EGVM
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)

Papers citing "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"

50 / 1,010 papers shown
A Gauss-Newton Approach for Min-Max Optimization in Generative
  Adversarial Networks
A Gauss-Newton Approach for Min-Max Optimization in Generative Adversarial NetworksIEEE International Joint Conference on Neural Network (IJCNN), 2024
Neel Mishra
Bamdev Mishra
Pratik Jawanpuria
Pawan Kumar
GAN
198
2
0
10 Apr 2024
StoryImager: A Unified and Efficient Framework for Coherent Story
  Visualization and Completion
StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion
Ming Tao
Bing-Kun Bao
Hao Tang
Yaowei Wang
Changsheng Xu
DiffM
286
19
0
09 Apr 2024
YaART: Yet Another ART Rendering Technology
YaART: Yet Another ART Rendering Technology
Sergey Kastryulin
Artem Konev
Alexander Shishenya
Eugene Lyapustin
Artem Khurshudov
...
Dmitrii Kornilov
Mikhail Romanov
Artem Babenko
Sergei Ovcharenko
Valentin Khrulkov
EGVM
211
3
0
08 Apr 2024
InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise
  Optimization
InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
Xiefan Guo
Jinlin Liu
Miaomiao Cui
Jiankai Li
Hongyu Yang
Di Huang
381
80
0
06 Apr 2024
Aligning Diffusion Models by Optimizing Human Utility
Aligning Diffusion Models by Optimizing Human Utility
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Yusuke Kato
Kazuki Kozuka
297
66
0
06 Apr 2024
Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from
  Interleaved Multimodal Inputs
Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs
Junhao Chen
Xiang Li
Xiaojun Ye
Chao Li
Zhaoxin Fan
Hao Zhao
VGen3DV
400
6
0
05 Apr 2024
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt
  Coherence Metrics with T2IScoreScore (TS2)
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)
Michael Stephen Saxon
Fatima Jahara
Mahsa Khoshnoodi
Yujie Lu
Aditya Sharma
William Y. Wang
EGVM
296
12
0
05 Apr 2024
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency
  Determines Multimodal Model Performance
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model PerformanceNeural Information Processing Systems (NeurIPS), 2024
Vishaal Udandarao
Christian Schroeder de Witt
Adhiraj Ghosh
Yash Sharma
Juil Sock
Adel Bibi
Samuel Albanie
Matthias Bethge
VLM
705
80
0
04 Apr 2024
Many-to-many Image Generation with Auto-regressive Diffusion Models
Many-to-many Image Generation with Auto-regressive Diffusion Models
Ying Shen
Yizhe Zhang
Shuangfei Zhai
Lifu Huang
J. Susskind
Jiatao Gu
283
8
0
03 Apr 2024
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
  Prediction
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale PredictionNeural Information Processing Systems (NeurIPS), 2024
Keyu Tian
Yi Jiang
Zehuan Yuan
Zehuan Yuan
Liwei Wang
VGen
410
715
0
03 Apr 2024
Confidence-aware Reward Optimization for Fine-tuning Text-to-Image
  Models
Confidence-aware Reward Optimization for Fine-tuning Text-to-Image ModelsInternational Conference on Learning Representations (ICLR), 2024
Kyuyoung Kim
Jongheon Jeong
Minyong An
Mohammad Ghavamzadeh
Krishnamurthy Dvijotham
Jinwoo Shin
Kimin Lee
EGVM
178
6
0
02 Apr 2024
MotionChain: Conversational Motion Controllers via Multimodal Prompts
MotionChain: Conversational Motion Controllers via Multimodal PromptsEuropean Conference on Computer Vision (ECCV), 2024
Biao Jiang
Xin Chen
C. Zhang
Fukun Yin
Zhuoyuan Li
Gang Yu
Jiayuan Fan
VGenLRM
274
21
0
02 Apr 2024
Bigger is not Always Better: Scaling Properties of Latent Diffusion
  Models
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Kangfu Mei
Zhengzhong Tu
M. Delbracio
Hossein Talebi
Vishal M. Patel
P. Milanfar
DiffM
304
22
0
01 Apr 2024
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin
Deepak Pathak
Baiqi Li
Jiayao Li
Xide Xia
Graham Neubig
Pengchuan Zhang
Deva Ramanan
EGVM
450
342
0
01 Apr 2024
A Unified and Interpretable Emotion Representation and Expression
  Generation
A Unified and Interpretable Emotion Representation and Expression Generation
Reni Paskaleva
Mykyta Holubakha
Andela Ilic
Saman Motamed
Luc Van Gool
D. Paudel
151
9
0
01 Apr 2024
Uncovering the Text Embedding in Text-to-Image Diffusion Models
Uncovering the Text Embedding in Text-to-Image Diffusion Models
Huikang Yu
Hao Luo
Fan Wang
Feng Zhao
157
12
0
01 Apr 2024
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large
  Language Model
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
Lirui Zhao
Yue Yang
Kaipeng Zhang
Wenqi Shao
Yuxin Zhang
Yu Qiao
Ping Luo
Rongrong Ji
LM&RoLLMAGVLM
127
7
0
31 Mar 2024
BAMM: Bidirectional Autoregressive Motion Model
BAMM: Bidirectional Autoregressive Motion Model
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Pu Wang
Minwoo Lee
Srijan Das
Chong Chen
VGen
328
40
0
28 Mar 2024
DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context
  in Editable Face Generation
DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation
Haonan Lin
Mengmeng Wang
Yan Chen
Wenbin An
Yuzhe Yao
Guang Dai
Qianying Wang
Yong-Jin Liu
Jingdong Wang
DiffM
221
8
0
28 Mar 2024
Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Yutong He
Avi Schwarzschild
Naoki Murata
Yiding Jiang
J. Williams
George Pappas
Hamed Hassani
Yuki Mitsufuji
Ruslan Salakhutdinov
J. Zico Kolter
DiffM
342
8
0
28 Mar 2024
TextCraftor: Your Text Encoder Can be Image Quality Controller
TextCraftor: Your Text Encoder Can be Image Quality Controller
Yanyu Li
Xian Liu
Vidit Goel
Ju Hu
Yerlan Idelbayev
Dhritiman Sagar
Yanzhi Wang
Sergey Tulyakov
Jian Ren
301
27
0
27 Mar 2024
Attention Calibration for Disentangled Text-to-Image Personalization
Attention Calibration for Disentangled Text-to-Image Personalization
Yanbing Zhang
Mengping Yang
Qin Zhou
Zhe Wang
357
29
0
27 Mar 2024
Improving Text-to-Image Consistency via Automatic Prompt Optimization
Improving Text-to-Image Consistency via Automatic Prompt Optimization
Oscar Manas
Pietro Astolfi
Melissa Hall
Candace Ross
Jack Urbanek
Adina Williams
Aishwarya Agrawal
Adriana Romero Soriano
M. Drozdzal
279
63
0
26 Mar 2024
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image
  Generation
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
Yingshan Chang
Yasi Zhang
Zhiyuan Fang
Yingnian Wu
Yonatan Bisk
Feng Gao
EGVM
255
10
0
25 Mar 2024
Generative Active Learning for Image Synthesis Personalization
Generative Active Learning for Image Synthesis PersonalizationACM Multimedia (MM), 2024
Xu-Lu Zhang
Wengyu Zhang
Xiao Wei
Jinlin Wu
Zhaoxiang Zhang
Zhen Lei
Qing Li
292
5
0
22 Mar 2024
CLIP-VQDiffusion : Langauge Free Training of Text To Image generation
  using CLIP and vector quantized diffusion model
CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusion model
S. Han
Joohee Kim
DiffMCLIP
187
4
0
22 Mar 2024
When Do We Not Need Larger Vision Models?
When Do We Not Need Larger Vision Models?
Baifeng Shi
Ziyang Wu
Maolin Mao
Xin Wang
Trevor Darrell
VLMLRM
407
70
0
19 Mar 2024
Can AI Outperform Human Experts in Creating Social Media Creatives?
Can AI Outperform Human Experts in Creating Social Media Creatives?
Eunkyung Park
Raymond K. Wong
Junbum Kwon
210
1
0
19 Mar 2024
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion
  Distillation
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion DistillationACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia), 2024
Axel Sauer
Frederic Boesel
Tim Dockhorn
A. Blattmann
Patrick Esser
Robin Rombach
DiffM
353
221
0
18 Mar 2024
LayerDiff: Exploring Text-guided Multi-layered Composable Image
  Synthesis via Layer-Collaborative Diffusion Model
LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion ModelEuropean Conference on Computer Vision (ECCV), 2024
Runhu Huang
Kaixin Cai
Jianhua Han
Xiaodan Liang
Renjing Pei
Guansong Lu
Songcen Xu
Wei Zhang
Hang Xu
DiffM
179
14
0
18 Mar 2024
LogicalDefender: Discovering, Extracting, and Utilizing Common-Sense
  Knowledge
LogicalDefender: Discovering, Extracting, and Utilizing Common-Sense Knowledge
Yuhe Liu
Mengxue Kang
Zengchang Qin
Xiangxiang Chu
NAIVLM
158
0
0
18 Mar 2024
Automated data processing and feature engineering for deep learning and
  big data applications: a survey
Automated data processing and feature engineering for deep learning and big data applications: a surveyJournal of Information and Intelligence (JII), 2024
A. Mumuni
F. Mumuni
TPM
267
129
0
18 Mar 2024
Reward Guided Latent Consistency Distillation
Reward Guided Latent Consistency Distillation
Jiachen Li
Weixi Feng
Wenhu Chen
William Y. Wang
EGVM
212
23
0
16 Mar 2024
Desigen: A Pipeline for Controllable Design Template Generation
Desigen: A Pipeline for Controllable Design Template GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Haohan Weng
Danqing Huang
Yu Qiao
Zheng Hu
Chin-Yew Lin
Tong Zhang
Chong Chen
DiffM
202
22
0
14 Mar 2024
Follow-Your-Click: Open-domain Regional Image Animation via Short
  Prompts
Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts
Yue Ma
Yin-Yin He
Hongfa Wang
Andong Wang
Zhiheng Liu
...
Xiu Li
Zhifeng Li
H. Shum
Wei Liu
Qifeng Chen
VGenDiffM
277
61
0
13 Mar 2024
AesopAgent: Agent-driven Evolutionary System on Story-to-Video
  Production
AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production
Jiuniu Wang
Zehua Du
Yuyuan Zhao
Bo Yuan
Kexiang Wang
...
Yihen Lu
Gengliang Li
Junlong Gao
Xin Tu
Zhenyu Guo
LLMAGVGen
165
10
0
12 Mar 2024
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with
  Auto-Generated Data
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated DataNeural Information Processing Systems (NeurIPS), 2024
Jialu Li
Jaemin Cho
Yi-Lin Sung
Jaehong Yoon
Mohit Bansal
MoMeDiffM
250
15
0
11 Mar 2024
DivCon: Divide and Conquer for Complex Numerical and Spatial Reasoning in Text-to-Image Generation
DivCon: Divide and Conquer for Complex Numerical and Spatial Reasoning in Text-to-Image Generation
Yuhao Jia
Wenhan Tan
DiffM
314
1
0
11 Mar 2024
VideoElevator: Elevating Video Generation Quality with Versatile
  Text-to-Image Diffusion Models
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
Yabo Zhang
Yuxiang Wei
Xianhui Lin
Zheng Hui
Peiran Ren
Xuansong Xie
Xiangyang Ji
Wangmeng Zuo
VGen
220
10
0
08 Mar 2024
Towards Effective Usage of Human-Centric Priors in Diffusion Models for
  Text-based Human Image Generation
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Junyan Wang
Zhenhong Sun
Zhiyu Tan
Xuanbai Chen
Weihua Chen
Hao Li
Cheng Zhang
Yang Song
249
16
0
08 Mar 2024
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Xiwei Hu
Rui Wang
Yixiao Fang
Bin-Bin Fu
Pei Cheng
Gang Yu
VLM
284
246
0
08 Mar 2024
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
CogView3: Finer and Faster Text-to-Image Generation via Relay DiffusionEuropean Conference on Computer Vision (ECCV), 2024
Wendi Zheng
Jiayan Teng
Zhuoyi Yang
Weihan Wang
Jidong Chen
Xiaotao Gu
Yuxiao Dong
Ming Ding
Jie Tang
DiffM
254
63
0
08 Mar 2024
StereoDiffusion: Training-Free Stereo Image Generation Using Latent
  Diffusion Models
StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models
Lezhong Wang
J. Frisvad
Mark Bo Jensen
Siavash Bigdeli
DiffM
257
21
0
08 Mar 2024
Pix2Gif: Motion-Guided Diffusion for GIF Generation
Pix2Gif: Motion-Guided Diffusion for GIF GenerationEuropean Conference on Computer Vision (ECCV), 2024
Hitesh Kandala
Jianfeng Gao
Jianwei Yang
VGenDiffM
232
5
0
07 Mar 2024
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?International Conference on Learning Representations (ICLR), 2024
Ibrahim Alabdulmohsin
Xiao Wang
Andreas Steiner
Priya Goyal
Alexander DÁmour
Xiao-Qi Zhai
211
30
0
07 Mar 2024
Discriminative Probing and Tuning for Text-to-Image Generation
Discriminative Probing and Tuning for Text-to-Image Generation
Leigang Qu
Wenjie Wang
Chak Tou Leong
Hanwang Zhang
Liqiang Nie
Tat-Seng Chua
374
12
0
07 Mar 2024
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
...
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
2.3K
2,724
0
05 Mar 2024
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable
  Virtual Try-on
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Yuhao Xu
Tao Gu
Weifeng Chen
Chengcai Chen
DiffM
298
130
0
04 Mar 2024
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Ekaterina Deyneka
Hsiang-wei Chao
...
Yuwei Fang
Hsin-Ying Lee
Jian Ren
Ming-Hsuan Yang
Sergey Tulyakov
VGen
366
338
0
29 Feb 2024
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized
  Diffusion Models
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Models
Shyam Marjit
Harshit Singh
Nityanand Mathur
Sayak Paul
Chia-Mu Yu
Pin-Yu Chen
DiffM
313
12
0
27 Feb 2024
Previous
123...91011...192021
Next