Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

22 June 2022

ArXiv (abs)PDF HTML HuggingFace (4 upvotes)

Papers citing "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"

50 / 1,011 papers shown

Looking at words and points with attention: a benchmark for text-to-shape coherence

149

14 Sep 2023

Masked Generative Modeling with Enhanced Sampling Scheme

Daesoo Lee

Erlend Aune

Sara Malacarne

DiffM

193

14 Sep 2023

InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image GenerationInternational Conference on Learning Representations (ICLR), 2023

Qiang Liu

595

312

12 Sep 2023

PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models

...

Min Zheng

166

11 Sep 2023

ITI-GEN: Inclusive Text-to-Image GenerationIEEE International Conference on Computer Vision (ICCV), 2023

251

11 Sep 2023

NExT-GPT: Any-to-Any Multimodal LLMInternational Conference on Machine Learning (ICML), 2023

Hao Fei

Wei Ji

379

717

11 Sep 2023

Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual TokenizationInternational Conference on Learning Representations (ICLR), 2023

Kun Xu

...

240

09 Sep 2023

MaskDiffusion: Boosting Text-to-Image Consistency with Conditional MaskInternational Journal of Computer Vision (IJCV), 2023

171

08 Sep 2023

Exploring Sparse MoE in GANs for Text-conditioned Image SynthesisComputer Vision and Pattern Recognition (CVPR), 2023

262

07 Sep 2023

Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation

Zuxuan Wu

Wei Zhang

Hang Xu

224

07 Sep 2023

Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

...

Luke Zettlemoyer

277

164

05 Sep 2023

Breaking Barriers to Creative Expression: Co-Designing and Implementing an Accessible Text-to-Image Interface

168

05 Sep 2023

MAGMA: Music Aligned Generative Motion Autodecoder

Sohan Anisetty

Amit Raj

James Hays

151

03 Sep 2023

Bridge Diffusion Model: bridge non-English language-native text-to-image diffusion model with English communities

140

02 Sep 2023

RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large ModelIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

...

Pareesa Ameneh Golnari

Yuxiong He

254

02 Sep 2023

VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation

Errui Ding

Jingdong Wang

VGen

332

01 Sep 2023

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only ImagesIEEE International Conference on Computer Vision (ICCV), 2023

Xiaodan Liang

Wei Zhang

Hang Xu

231

31 Aug 2023

Priority-Centric Human Motion Generation in Discrete Latent SpaceIEEE International Conference on Computer Vision (ICCV), 2023

448

28 Aug 2023

AI-Generated Content (AIGC) for Various Data Modalities: A SurveyACM Computing Surveys (ACM Comput. Surv.), 2023

Lin Geng Foo

Hossein Rahmani

Jing Liu

760

27 Aug 2023

Dense Text-to-Image Generation with Attention ModulationIEEE International Conference on Computer Vision (ICCV), 2023

Jun-Yan Zhu

281

181

24 Aug 2023

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across LanguagesInternational Conference on Learning Representations (ICLR), 2023

Jinyi Hu

...

Yankai Lin

Jiao Xue

Dahai Li

Zhiyuan Liu

Maosong Sun

MLLM VLM

276

23 Aug 2023

StoryBench: A Multifaceted Benchmark for Continuous Story VisualizationNeural Information Processing Systems (NeurIPS), 2023

Pieter-Jan Kindermans

P. Voigtlaender

VGen

332

22 Aug 2023

DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic AlignmentIEEE International Conference on Computer Vision (ICCV), 2023

Xujie Zhang

Binbin Yang

Michael C. Kampffmeyer

Hang Xu

Xiaodan Liang

DiffM

396

22 Aug 2023

Backdooring Textual Inversion for Concept Censorship

269

21 Aug 2023

SimDA: Simple Diffusion Adapter for Efficient Video GenerationComputer Vision and Pattern Recognition (CVPR), 2023

Zuxuan Wu

268

107

18 Aug 2023

Edit Temporal-Consistent Videos with Image Diffusion Model

260

17 Aug 2023

Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment

Qi Wu

205

16 Aug 2023

Painter: Teaching Auto-regressive Language Models to Draw Sketches

Reza Pourreza

Apratim Bhattacharyya

174

16 Aug 2023

Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image TranslationACM Multimedia (ACM MM), 2023

173

14 Aug 2023

MarkovGen: Structured Prediction for Efficient Text-to-Image GenerationComputer Vision and Pattern Recognition (CVPR), 2023

284

14 Aug 2023

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

323

1,264

13 Aug 2023

White-box Membership Inference Attacks against Diffusion ModelsProceedings on Privacy Enhancing Technologies (PoPETs), 2023

289

11 Aug 2023

LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image GenerationACM Multimedia (ACM MM), 2023

Hao Fei

324

130

09 Aug 2023

Circumventing Concept Erasure Methods For Text-to-Image Generative ModelsInternational Conference on Learning Representations (ICLR), 2023

238

03 Aug 2023

Guiding Image Captioning Models Toward More Specific CaptionsIEEE International Conference on Computer Vision (ICCV), 2023

Simon Kornblith

Lala Li

Zirui Wang

Thao Nguyen

320

31 Jul 2023

Visual Instruction Inversion: Image Editing via Visual Prompting

Thao Nguyen

142

26 Jul 2023

TF-ICON: Diffusion-Based Training-Free Cross-Domain Image CompositionIEEE International Conference on Computer Vision (ICCV), 2023

Shilin Lu

Yanzhu Liu

A. Kong

622

185

24 Jul 2023

Divide & Bind Your Attention for Improved Generative Semantic NursingBritish Machine Vision Conference (BMVC), 2023

339

20 Jul 2023

Text2Layer: Layered Image Generation using Latent Diffusion Model

193

19 Jul 2023

Beyond the ML Model: Applying Safety Engineering Frameworks to Text-to-Image DevelopmentAAAI/ACM Conference on AI, Ethics, and Society (AIES), 2023

Shalaleh Rismani

Renee Shelby

A. Smart

Renelito Delos Santos

AJung Moon

Negar Rostamzadeh

228

19 Jul 2023

Complexity Matters: Rethinking the Latent Space for Generative ModelingNeural Information Processing Systems (NeurIPS), 2023

Tianyang Hu

320

17 Jul 2023

Zero-Shot Image Harmonization with Generative Model PriorIEEE transactions on multimedia (IEEE TMM), 2023

302

17 Jul 2023

Can Pre-Trained Text-to-Image Models Generate Visual Goals for Reinforcement Learning?Neural Information Processing Systems (NeurIPS), 2023

197

15 Jul 2023

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image ModelsComputer Vision and Pattern Recognition (CVPR), 2023

Nataniel Ruiz

Yuanzhen Li

Varun Jampani

Wei Wei

Tingbo Hou

Yael Pritch

Neal Wadhwa

Michael Rubinstein

Kfir Aberman

DiffM

212

228

13 Jul 2023

Emu: Generative Pretraining in MultimodalityInternational Conference on Learning Representations (ICLR), 2023

Hongcheng Gao

359

155

11 Jul 2023

Diffusion idea exploration for art generation

N. Verma

DiffM

225

11 Jul 2023

Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA FeedbackNeural Information Processing Systems (NeurIPS), 2023

Jaskirat Singh

Liang Zheng

299

10 Jul 2023

DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer

348

09 Jul 2023

Text-Guided Synthesis of Eulerian CinemagraphsACM Transactions on Graphics (TOG), 2023

Hsin-Ying Lee

206

06 Jul 2023

SDXL: Improving Latent Diffusion Models for High-Resolution Image SynthesisInternational Conference on Learning Representations (ICLR), 2023

1.7K

3,833

04 Jul 2023