ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.10789
  4. Cited By
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

22 June 2022
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
Zirui Wang
Vijay Vasudevan
Alexander Ku
Yinfei Yang
Burcu Karagol Ayan
Ben Hutchinson
Wei Han
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
    EGVM
ArXivPDFHTML

Papers citing "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"

50 / 865 papers shown
Title
YaART: Yet Another ART Rendering Technology
YaART: Yet Another ART Rendering Technology
Sergey Kastryulin
Artem Konev
Alexander Shishenya
Eugene Lyapustin
Artem Khurshudov
...
Dmitrii Kornilov
Mikhail Romanov
Artem Babenko
Sergei Ovcharenko
Valentin Khrulkov
EGVM
28
1
0
08 Apr 2024
InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise
  Optimization
InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
Xiefan Guo
Jinlin Liu
Miaomiao Cui
Jiankai Li
Hongyu Yang
Di Huang
23
25
0
06 Apr 2024
Aligning Diffusion Models by Optimizing Human Utility
Aligning Diffusion Models by Optimizing Human Utility
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Yusuke Kato
Kazuki Kozuka
105
27
0
06 Apr 2024
Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from
  Interleaved Multimodal Inputs
Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs
Junhao Chen
Xiang Li
Xiaojun Ye
Chao Li
Zhaoxin Fan
Hao Zhao
VGen
3DV
200
4
0
05 Apr 2024
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt
  Coherence Metrics with T2IScoreScore (TS2)
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)
Michael Stephen Saxon
Fatima Jahara
Mahsa Khoshnoodi
Yujie Lu
Aditya Sharma
William Yang Wang
EGVM
20
9
0
05 Apr 2024
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency
  Determines Multimodal Model Performance
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Vishaal Udandarao
Ameya Prabhu
Adhiraj Ghosh
Yash Sharma
Philip H. S. Torr
Adel Bibi
Samuel Albanie
Matthias Bethge
VLM
118
44
0
04 Apr 2024
Many-to-many Image Generation with Auto-regressive Diffusion Models
Many-to-many Image Generation with Auto-regressive Diffusion Models
Ying Shen
Yizhe Zhang
Shuangfei Zhai
Lifu Huang
J. Susskind
Jiatao Gu
38
6
0
03 Apr 2024
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
  Prediction
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Keyu Tian
Yi-Xin Jiang
Zehuan Yuan
Bingyue Peng
Liwei Wang
VGen
25
248
0
03 Apr 2024
Confidence-aware Reward Optimization for Fine-tuning Text-to-Image
  Models
Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models
Kyuyoung Kim
Jongheon Jeong
Minyong An
Mohammad Ghavamzadeh
Krishnamurthy Dvijotham
Jinwoo Shin
Kimin Lee
EGVM
29
6
0
02 Apr 2024
MotionChain: Conversational Motion Controllers via Multimodal Prompts
MotionChain: Conversational Motion Controllers via Multimodal Prompts
Biao Jiang
Xin Chen
C. Zhang
Fukun Yin
Zhuoyuan Li
Gang Yu
Jiayuan Fan
VGen
LRM
29
10
0
02 Apr 2024
Bigger is not Always Better: Scaling Properties of Latent Diffusion
  Models
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Kangfu Mei
Zhengzhong Tu
M. Delbracio
Hossein Talebi
Vishal M. Patel
P. Milanfar
DiffM
50
12
0
01 Apr 2024
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin
Deepak Pathak
Baiqi Li
Jiayao Li
Xide Xia
Graham Neubig
Pengchuan Zhang
Deva Ramanan
EGVM
37
125
0
01 Apr 2024
A Unified and Interpretable Emotion Representation and Expression
  Generation
A Unified and Interpretable Emotion Representation and Expression Generation
Reni Paskaleva
Mykyta Holubakha
Andela Ilic
Saman Motamed
Luc Van Gool
D. Paudel
25
2
0
01 Apr 2024
Uncovering the Text Embedding in Text-to-Image Diffusion Models
Uncovering the Text Embedding in Text-to-Image Diffusion Models
Huikang Yu
Hao Luo
Fan Wang
Feng Zhao
26
10
0
01 Apr 2024
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large
  Language Model
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
Lirui Zhao
Yue Yang
Kaipeng Zhang
Wenqi Shao
Yuxin Zhang
Yu Qiao
Ping Luo
Rongrong Ji
LM&Ro
LLMAG
VLM
29
3
0
31 Mar 2024
BAMM: Bidirectional Autoregressive Motion Model
BAMM: Bidirectional Autoregressive Motion Model
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Pu Wang
Minwoo Lee
Srijan Das
C. L. P. Chen
VGen
27
20
0
28 Mar 2024
DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context
  in Editable Face Generation
DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation
Haonan Lin
Mengmeng Wang
Yan Chen
Wenbin An
Yuzhe Yao
Guang Dai
Qianying Wang
Yong-Jin Liu
Jingdong Wang
DiffM
38
4
0
28 Mar 2024
Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Yutong He
Alexander Robey
Naoki Murata
Yiding Jiang
J. Williams
George Pappas
Hamed Hassani
Yuki Mitsufuji
Ruslan Salakhutdinov
J. Zico Kolter
DiffM
91
4
0
28 Mar 2024
TextCraftor: Your Text Encoder Can be Image Quality Controller
TextCraftor: Your Text Encoder Can be Image Quality Controller
Yanyu Li
Xian Liu
Anil Kag
Ju Hu
Yerlan Idelbayev
Dhritiman Sagar
Yanzhi Wang
Sergey Tulyakov
Jian Ren
34
13
0
27 Mar 2024
Attention Calibration for Disentangled Text-to-Image Personalization
Attention Calibration for Disentangled Text-to-Image Personalization
Yanbing Zhang
Mengping Yang
Qin Zhou
Zhe Wang
22
15
0
27 Mar 2024
Improving Text-to-Image Consistency via Automatic Prompt Optimization
Improving Text-to-Image Consistency via Automatic Prompt Optimization
Oscar Manas
Pietro Astolfi
Melissa Hall
Candace Ross
Jack Urbanek
Adina Williams
Aishwarya Agrawal
Adriana Romero Soriano
M. Drozdzal
29
26
0
26 Mar 2024
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image
  Generation
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
Yingshan Chang
Yasi Zhang
Zhiyuan Fang
Yingnian Wu
Yonatan Bisk
Feng Gao
EGVM
34
6
0
25 Mar 2024
Generative Active Learning for Image Synthesis Personalization
Generative Active Learning for Image Synthesis Personalization
Xu-Lu Zhang
Wengyu Zhang
Xiao Wei
Jinlin Wu
Zhaoxiang Zhang
Zhen Lei
Qing Li
99
1
0
22 Mar 2024
CLIP-VQDiffusion : Langauge Free Training of Text To Image generation
  using CLIP and vector quantized diffusion model
CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusion model
S. Han
Joohee Kim
DiffM
CLIP
32
1
0
22 Mar 2024
When Do We Not Need Larger Vision Models?
When Do We Not Need Larger Vision Models?
Baifeng Shi
Ziyang Wu
Maolin Mao
Xin Wang
Trevor Darrell
VLM
LRM
44
40
0
19 Mar 2024
Can AI Outperform Human Experts in Creating Social Media Creatives?
Can AI Outperform Human Experts in Creating Social Media Creatives?
Eunkyung Park
Raymond K. Wong
Junbum Kwon
30
0
0
19 Mar 2024
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion
  Distillation
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
Axel Sauer
Frederic Boesel
Tim Dockhorn
A. Blattmann
Patrick Esser
Robin Rombach
DiffM
24
104
0
18 Mar 2024
LayerDiff: Exploring Text-guided Multi-layered Composable Image
  Synthesis via Layer-Collaborative Diffusion Model
LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model
Runhu Huang
Kaixin Cai
Jianhua Han
Xiaodan Liang
Renjing Pei
Guansong Lu
Songcen Xu
Wei Zhang
Hang Xu
DiffM
20
3
0
18 Mar 2024
LogicalDefender: Discovering, Extracting, and Utilizing Common-Sense
  Knowledge
LogicalDefender: Discovering, Extracting, and Utilizing Common-Sense Knowledge
Yuhe Liu
Mengxue Kang
Zengchang Qin
Xiangxiang Chu
NAI
VLM
33
0
0
18 Mar 2024
Automated data processing and feature engineering for deep learning and
  big data applications: a survey
Automated data processing and feature engineering for deep learning and big data applications: a survey
A. Mumuni
F. Mumuni
TPM
35
45
0
18 Mar 2024
Reward Guided Latent Consistency Distillation
Reward Guided Latent Consistency Distillation
Jiachen Li
Weixi Feng
Wenhu Chen
William Yang Wang
EGVM
21
11
0
16 Mar 2024
Desigen: A Pipeline for Controllable Design Template Generation
Desigen: A Pipeline for Controllable Design Template Generation
Haohan Weng
Danqing Huang
Yu Qiao
Zheng Hu
Chin-Yew Lin
Tong Zhang
C. L. P. Chen
DiffM
19
14
0
14 Mar 2024
Follow-Your-Click: Open-domain Regional Image Animation via Short
  Prompts
Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts
Yue Ma
Yin-Yin He
Hongfa Wang
Andong Wang
Chenyang Qi
...
Xiu Li
Zhifeng Li
H. Shum
Wei Liu
Qifeng Chen
VGen
DiffM
104
37
0
13 Mar 2024
AesopAgent: Agent-driven Evolutionary System on Story-to-Video
  Production
AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production
Jiuniu Wang
Zehua Du
Yuyuan Zhao
Bo Yuan
Kexiang Wang
...
Yihen Lu
Gengliang Li
Junlong Gao
Xin Tu
Zhenyu Guo
LLMAG
VGen
28
7
0
12 Mar 2024
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with
  Auto-Generated Data
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data
Jialu Li
Jaemin Cho
Yi-Lin Sung
Jaehong Yoon
Mohit Bansal
MoMe
DiffM
34
8
0
11 Mar 2024
DivCon: Divide and Conquer for Progressive Text-to-Image Generation
DivCon: Divide and Conquer for Progressive Text-to-Image Generation
Yuhao Jia
Wenhan Tan
DiffM
39
1
0
11 Mar 2024
VideoElevator: Elevating Video Generation Quality with Versatile
  Text-to-Image Diffusion Models
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
Yabo Zhang
Yuxiang Wei
Xianhui Lin
Zheng Hui
Peiran Ren
Xuansong Xie
Xiangyang Ji
Wangmeng Zuo
VGen
38
6
0
08 Mar 2024
Towards Effective Usage of Human-Centric Priors in Diffusion Models for
  Text-based Human Image Generation
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
Junyan Wang
Zhenhong Sun
Zhiyu Tan
Xuanbai Chen
Weihua Chen
Hao Li
Cheng Zhang
Yang Song
27
9
0
08 Mar 2024
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Xiwei Hu
Rui Wang
Yixiao Fang
Bin-Bin Fu
Pei Cheng
Gang Yu
VLM
57
39
0
08 Mar 2024
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
Wendi Zheng
Jiayan Teng
Zhuoyi Yang
Weihan Wang
Jidong Chen
Xiaotao Gu
Yuxiao Dong
Ming Ding
Jie Tang
DiffM
19
34
0
08 Mar 2024
StereoDiffusion: Training-Free Stereo Image Generation Using Latent
  Diffusion Models
StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models
Lezhong Wang
J. Frisvad
Mark Bo Jensen
Siavash Bigdeli
DiffM
27
10
0
08 Mar 2024
Pix2Gif: Motion-Guided Diffusion for GIF Generation
Pix2Gif: Motion-Guided Diffusion for GIF Generation
Hitesh Kandala
Jianfeng Gao
Jianwei Yang
VGen
DiffM
30
3
0
07 Mar 2024
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Ibrahim M. Alabdulmohsin
Xiao Wang
Andreas Steiner
Priya Goyal
Alexander DÁmour
Xiao-Qi Zhai
26
16
0
07 Mar 2024
Discriminative Probing and Tuning for Text-to-Image Generation
Discriminative Probing and Tuning for Text-to-Image Generation
Leigang Qu
Wenjie Wang
Yongqi Li
Hanwang Zhang
Liqiang Nie
Tat-Seng Chua
31
7
0
07 Mar 2024
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
...
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
68
1,047
0
05 Mar 2024
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable
  Virtual Try-on
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Yuhao Xu
Tao Gu
Weifeng Chen
Chengcai Chen
DiffM
27
49
0
04 Mar 2024
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Ekaterina Deyneka
Hsiang-wei Chao
...
Yuwei Fang
Hsin-Ying Lee
Jian Ren
Ming-Hsuan Yang
Sergey Tulyakov
VGen
70
177
0
29 Feb 2024
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized
  Diffusion Models
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Models
Shyam Marjit
Harshit Singh
Nityanand Mathur
Sayak Paul
Chia-Mu Yu
Pin-Yu Chen
DiffM
25
2
0
27 Feb 2024
Disentangled 3D Scene Generation with Layout Learning
Disentangled 3D Scene Generation with Layout Learning
Dave Epstein
Ben Poole
B. Mildenhall
Alexei A. Efros
Aleksander Holynski
CoGe
OCL
3DV
40
20
0
26 Feb 2024
Contextualized Diffusion Models for Text-Guided Image and Video
  Generation
Contextualized Diffusion Models for Text-Guided Image and Video Generation
Ling Yang
Zhilong Zhang
Zhaochen Yu
Jingwei Liu
Minkai Xu
Stefano Ermon
Bin Cui
31
4
0
26 Feb 2024
Previous
123...678...161718
Next