Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

22 June 2022

ArXiv (abs)PDF HTML HuggingFace (4 upvotes)

Papers citing "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"

50 / 1,010 papers shown

Unified Reward Model for Multimodal Understanding and Generation

395

07 Mar 2025

CacheQuant: Comprehensively Accelerated Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2025

202

03 Mar 2025

Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic DataComputer Vision and Pattern Recognition (CVPR), 2025

Haoxin Li

Boyang Li

CoGe

693

03 Mar 2025

MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual RepresentationsComputer Vision and Pattern Recognition (CVPR), 2025

541

02 Mar 2025

Speculative Decoding and Beyond: An In-Depth Survey of Techniques

742

27 Feb 2025

Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation

537

27 Feb 2025

Multi-Dimensional Quality Assessment for Text-to-3D Assets: Dataset and ModelIEEE transactions on multimedia (TMM), 2025

151

24 Feb 2025

Unified Prompt Attack Against Text-to-Image Generation ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025

262

23 Feb 2025

Data Attribution for Text-to-Image Models by Unlearning Synthesized ImagesNeural Information Processing Systems (NeurIPS), 2024

461

21 Feb 2025

Accelerating Diffusion Transformers with Token-wise Feature CachingInternational Conference on Learning Representations (ICLR), 2024

423

20 Feb 2025

EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

690

13 Feb 2025

UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths

702

10 Feb 2025

LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models

351

10 Feb 2025

FairT2I: Mitigating Social Bias in Text-to-Image Generation via Large Language Model-Assisted Detection and Attribute Rebalancing

Jinya Sakurai

Issei Sato

509

06 Feb 2025

HuViDPO:Enhancing Video Generation through Direct Preference Optimization for Human-Centric Alignment

256

02 Feb 2025

CAT Pruning: Cluster-Aware Token Pruning For Text-to-Image Diffusion Models

200

01 Feb 2025

PreciseCam: Precise Camera Control for Text-to-Image GenerationComputer Vision and Pattern Recognition (CVPR), 2025

Yannick Hold-Geoffroy

Xin Sun

Diego F. F. Gutierrez

DiffM VGen

216

22 Jan 2025

VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model

338

21 Jan 2025

A Comprehensive Survey of Foundation Models in MedicineIEEE Reviews in Biomedical Engineering (RBME), 2024

771

17 Jan 2025

Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens

433

13 Jan 2025

Personalized Preference Fine-tuning of Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2025

132

11 Jan 2025

Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image GenerationComputer Vision and Pattern Recognition (CVPR), 2025

...

306

11 Jan 2025

INFELM: In-depth Fairness Evaluation of Large Text-To-Image Models

1.1K

10 Jan 2025

EditAR: Unified Conditional Generation with Autoregressive ModelsComputer Vision and Pattern Recognition (CVPR), 2025

253

08 Jan 2025

Learning the Language of Protein Structure

275

08 Jan 2025

Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image ModelsPatterns (Patterns), 2024

451

03 Jan 2025

TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions

325

02 Jan 2025

Grid Diffusion Models for Text-to-Video GenerationComputer Vision and Pattern Recognition (CVPR), 2024

Taegyeong Lee

Soyeong Kwon

Taehwan Kim

313

31 Dec 2024

DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers

Yuntao Chen

Yuqi Wang

Rundong Wang

1.0K

24 Dec 2024

TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction

...

667

22 Dec 2024

When Worse is Better: Navigating the compression-generation tradeoff in visual tokenization

427

20 Dec 2024

Parallelized Autoregressive Visual GenerationComputer Vision and Pattern Recognition (CVPR), 2024

649

19 Dec 2024

Next Patch Prediction for Autoregressive Visual Generation

...

633

19 Dec 2024

Dialogue with the Machine and Dialogue with the Art World: Evaluating Generative AI for Culturally-Situated Creativity

225

18 Dec 2024

Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image EditingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024

317

15 Dec 2024

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and TrainingComputer Vision and Pattern Recognition (CVPR), 2024

...

305

12 Dec 2024

LoRACLR: Contrastive Adaptation for Customization of Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2024

365

12 Dec 2024

Mojito: Motion Trajectory and Intensity Control for Video Generation

714

12 Dec 2024

[MASK] is All You Need

Vincent Tao Hu

Bjorn Ommer

DiffM

528

09 Dec 2024

Nested Diffusion Models Using Hierarchical Latent PriorsComputer Vision and Pattern Recognition (CVPR), 2024

365

08 Dec 2024

T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive ConceptsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

...

1.1K

05 Dec 2024

MFTF: Mask-free Training-free Object Level Layout Control Diffusion Model

Shan Yang

DiffM

213

02 Dec 2024

IQA-Adapter: Exploring Knowledge Transfer from Image Quality Assessment to Diffusion-based Generative Models

485

02 Dec 2024

Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis

664

02 Dec 2024

DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective SchedulingComputer Vision and Pattern Recognition (CVPR), 2024

Xin Xie

Dong Gong

582

01 Dec 2024

Continuous Concepts Removal in Text-to-image Diffusion Models

534

30 Nov 2024

DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion ModelsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024

315

28 Nov 2024

Self-Cross Diffusion Guidance for Text-to-Image Synthesis of Similar SubjectsComputer Vision and Pattern Recognition (CVPR), 2024

427

28 Nov 2024

Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation

261

27 Nov 2024

ModeDreamer: Mode Guiding Score Distillation for Text-to-3D Generation using Reference Image Prompts

454

27 Nov 2024