ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.12092
  4. Cited By
Zero-Shot Text-to-Image Generation
v1v2 (latest)

Zero-Shot Text-to-Image Generation

International Conference on Machine Learning (ICML), 2021
24 February 2021
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
    VLM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Zero-Shot Text-to-Image Generation"

50 / 3,689 papers shown
MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data
MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data
Yaobin Ling
Xiaoqian Jiang
Yejin Kim
GANSyDa
573
10
0
10 Apr 2026
Reason-SVG: Enhancing Structured Reasoning for Vector Graphics Generation with Reinforcement Learning
Reason-SVG: Enhancing Structured Reasoning for Vector Graphics Generation with Reinforcement Learning
Ximing Xing
Yandong Guan
Jing Zhang
Dong Xu
Qian Yu
Qian Yu
AI4TSLRM
294
6
0
10 Apr 2026
Distilling Specialized Orders for Visual Generation
Distilling Specialized Orders for Visual Generation
Rishav Pramanik
Antoine Poupon
Juan A. Rodriguez
Masih Aminbeidokhti
David Vazquez
Christopher Pal
Zhaozheng Yin
M. Pedersoli
Marco Pedersoli
VGenVLM
331
1
0
10 Apr 2026
Diffusion Language Models Know the Answer Before Decoding
Diffusion Language Models Know the Answer Before Decoding
Pengxiang Li
Yefan Zhou
Dilxat Muhtar
L. Yin
Shilin Yan
Li Shen
Yi Liang
Soroush Vosoughi
AI4CE
283
39
0
10 Apr 2026
QPT V2: Masked Image Modeling Advances Visual Scoring
QPT V2: Masked Image Modeling Advances Visual Scoring
Qizhi Xie
Kun Yuan
Yunpeng Qu
Mingda Wu
Ming Sun
Chao Zhou
Jihong Zhu
286
7
0
30 Mar 2026
SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models
SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models
Haowen Liu
Shaoxiong Yao
Haonan Chen
Jiawei Gao
Jiayuan Mao
Jia-Bin Huang
Yilun Du
LM&RoLRM
253
1
0
05 Dec 2025
ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction
ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction
Jiangtong Tan
Lin Liu
Jie Huanng
Xiaopeng Zhang
Qi Tian
Feng Zhao
MoE
57
1
0
05 Dec 2025
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
Elisabetta Fedele
Francis Engelmann
Ian Huang
Or Litany
Marc Pollefeys
Leonidas Guibas
3DGS
302
1
0
05 Dec 2025
Autoregressive Image Generation Needs Only a Few Lines of Cached Tokens
Autoregressive Image Generation Needs Only a Few Lines of Cached Tokens
Ziran Qin
Youru Lv
Mingbao Lin
Zeren Zhang
Chanfan Gan
Tieyuan Chen
W. Lin
DiffMVLM
159
1
0
04 Dec 2025
Rethinking the Use of Vision Transformers for AI-Generated Image Detection
Rethinking the Use of Vision Transformers for AI-Generated Image Detection
NaHyeon Park
Kunhee Kim
Junsuk Choe
Hyunjung Shim
DiffM
226
1
0
04 Dec 2025
SeeU: Seeing the Unseen World via 4D Dynamics-aware Generation
SeeU: Seeing the Unseen World via 4D Dynamics-aware Generation
Yu Yuan
Tharindu Wickremasinghe
Zeeshan Nadir
Xijun Wang
Yiheng Chi
Stanley H. Chan
VGenLRM
318
1
0
03 Dec 2025
ASCIIBench: Evaluating Language-Model-Based Understanding of Visually-Oriented Text
ASCIIBench: Evaluating Language-Model-Based Understanding of Visually-Oriented Text
Kerry Luo
Michael Fu
Joshua Peguero
Husnain Malik
Anvay Patil
Joyce Lin
Megan Van Overborg
Ryan Sarmiento
Kevin Zhu
81
2
0
02 Dec 2025
Understanding and Harnessing Sparsity in Unified Multimodal Models
Understanding and Harnessing Sparsity in Unified Multimodal Models
Shwai He
Chaorui Deng
Ang Li
Shen Yan
MoE
281
2
0
02 Dec 2025
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
Juanxi Tian
Siyuan Li
Conghui He
Lijun Wu
Cheng Tan
EGVMVGen
247
1
0
01 Dec 2025
Accelerating Inference of Masked Image Generators via Reinforcement Learning
Accelerating Inference of Masked Image Generators via Reinforcement Learning
Pranav Subbaraman
Shufan Li
Siyan Zhao
Aditya Grover
125
0
0
30 Nov 2025
FiCoTS: Fine-to-Coarse LLM-Enhanced Hierarchical Cross-Modality Interaction for Time Series Forecasting
FiCoTS: Fine-to-Coarse LLM-Enhanced Hierarchical Cross-Modality Interaction for Time Series Forecasting
Yafei Lyu
Hao Zhou
Lu Zhang
X. Yang
Zhiyong Liu
AI4TS
115
0
0
29 Nov 2025
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
Sinan Du
Jiahao Guo
Bo Li
Shuhao Cui
Zhengzhuo Xu
...
Yongxian Wei
Kun Gai
X. Wang
Kai Wu
C. Yuan
303
3
0
28 Nov 2025
Guiding Visual Autoregressive Models through Spectrum Weakening
Guiding Visual Autoregressive Models through Spectrum Weakening
Chaoyang Wang
Tianmeng Yang
Jingdong Wang
Yunhai Tong
DiffM
223
0
0
28 Nov 2025
VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models
VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models
Silin Cheng
Kai Han
MLLMVPVLMVLM
359
3
0
27 Nov 2025
Semantic Anchoring for Robust Personalization in Text-to-Image Diffusion Models
Semantic Anchoring for Robust Personalization in Text-to-Image Diffusion Models
Seoyun Yang
Gihoon Kim
Taesup Kim
132
0
0
27 Nov 2025
INSIGHT: An Interpretable Neural Vision-Language Framework for Reasoning of Generative Artifacts
INSIGHT: An Interpretable Neural Vision-Language Framework for Reasoning of Generative Artifacts
Anshul Bagaria
DiffM
141
0
0
27 Nov 2025
DiverseVAR: Balancing Diversity and Quality of Next-Scale Visual Autoregressive Models
DiverseVAR: Balancing Diversity and Quality of Next-Scale Visual Autoregressive Models
Mingue Park
Prin Phunyaphibarn
Phillip Y. Lee
Minhyuk Sung
186
0
0
26 Nov 2025
DINO-Tok: Adapting DINO for Visual Tokenizers
DINO-Tok: Adapting DINO for Visual Tokenizers
Mingkai Jia
Mingxiao Li
Liaoyuan Fan
Tianxing Shi
Jiaxin Guo
...
Wei Yin
Xiao-Xiao Long
Qian Zhang
Ping Tan
Wei Yin
234
2
0
25 Nov 2025
Training-Free Diffusion Priors for Text-to-Image Generation via Optimization-based Visual Inversion
Training-Free Diffusion Priors for Text-to-Image Generation via Optimization-based Visual Inversion
Samuele DellÉrba
Andrew D. Bagdanov
225
0
0
25 Nov 2025
Dynamical Properties of Tokens in Self-Attention and Effects of Positional Encoding
Dynamical Properties of Tokens in Self-Attention and Effects of Positional Encoding
Duy-Tung Pham
A. Nguyen
Viet-Hoang Tran
Nhan-Phu Chung
Xin T. Tong
T. Nguyen
Thieu N. Vo
117
0
0
25 Nov 2025
Temporal-Visual Semantic Alignment: A Unified Architecture for Transferring Spatial Priors from Vision Models to Zero-Shot Temporal Tasks
Temporal-Visual Semantic Alignment: A Unified Architecture for Transferring Spatial Priors from Vision Models to Zero-Shot Temporal Tasks
Xiangkai Ma
Han Zhang
Wenzhong Li
Sanglu Lu
AI4TSVGen
350
0
0
25 Nov 2025
RubricRL: Simple Generalizable Rewards for Text-to-Image Generation
RubricRL: Simple Generalizable Rewards for Text-to-Image Generation
Xuelu Feng
Yunsheng Li
Ziyu Wan
Zixuan Gao
Junsong Yuan
Dongdong Chen
Chunming Qiao
EGVM
343
1
0
25 Nov 2025
Single Image to High-Quality 3D Object via Latent Features
Single Image to High-Quality 3D Object via Latent Features
Huanning Dong
Yinuo Huang
Fan Li
Ping Kuang
3DV
376
0
0
24 Nov 2025
LAA3D: A Benchmark of Detecting and Tracking Low-Altitude Aircraft in 3D Space
LAA3D: A Benchmark of Detecting and Tracking Low-Altitude Aircraft in 3D Space
Hai Wu
Shuai Tang
Jiale Wang
Longkun Zou
Mingyue Guo
Rongqin Liang
Ke Chen
Yaowei Wang
204
2
0
24 Nov 2025
FineXtrol: Controllable Motion Generation via Fine-Grained Text
FineXtrol: Controllable Motion Generation via Fine-Grained Text
Keming Shen
Bizhu Wu
Junliang Chen
Xiaoqin Wang
Linlin Shen
VGen
187
3
0
24 Nov 2025
ConsistCompose: Unified Multimodal Layout Control for Image Composition
ConsistCompose: Unified Multimodal Layout Control for Image Composition
Xuanke Shi
B. Li
Xiaoyang Han
Zhongang Cai
Lei Yang
Dahua Lin
Quan-ding Wang
MLLM
448
2
0
23 Nov 2025
ViMix-14M: A Curated Multi-Source Video-Text Dataset with Long-Form, High-Quality Captions and Crawl-Free Access
ViMix-14M: A Curated Multi-Source Video-Text Dataset with Long-Form, High-Quality Captions and Crawl-Free Access
Timing Yang
Sucheng Ren
Alan Yuille
Feng Wang
VGen
155
1
0
23 Nov 2025
Synthetic Curriculum Reinforces Compositional Text-to-Image Generation
Synthetic Curriculum Reinforces Compositional Text-to-Image Generation
Shijian Wang
Runhao Fu
Siyi Zhao
Qingqin Zhan
Xingjian Wang
Jiarui Jin
Yuan Lu
Hanqian Wu
Cunjian Chen
EGVM
265
0
0
23 Nov 2025
MINDiff: Mask-Integrated Negative Attention for Controlling Overfitting in Text-to-Image Personalization
MINDiff: Mask-Integrated Negative Attention for Controlling Overfitting in Text-to-Image Personalization
Seulgi Jeong
Jaeil Kim
DiffM
181
0
0
22 Nov 2025
EvDiff: High Quality Video with an Event Camera
EvDiff: High Quality Video with an Event Camera
Weilun Li
Lei-huan Sun
Ruixi Gao
Qi Jiang
Yuqin Ma
Kaiwei Wang
M. Yang
Luc Van Gool
D. Paudel
DiffMVGen
234
0
0
21 Nov 2025
Energy Scaling Laws for Diffusion Models: Quantifying Compute and Carbon Emissions in Image Generation
Energy Scaling Laws for Diffusion Models: Quantifying Compute and Carbon Emissions in Image Generation
Aniketh Iyengar
Jiaqi Han
Boris Ruf
Vincent Grari
Marcin Detyniecki
Stefano Ermon
DiffM
277
0
0
21 Nov 2025
PEPPER: Perception-Guided Perturbation for Robust Backdoor Defense in Text-to-Image Diffusion Models
PEPPER: Perception-Guided Perturbation for Robust Backdoor Defense in Text-to-Image Diffusion Models
Oscar Chew
Po-Yi Lu
Jayden Lin
Kuan-Hao Huang
Hsuan-Tien Lin
SILM
264
0
0
20 Nov 2025
AMS-KV: Adaptive KV Caching in Multi-Scale Visual Autoregressive Transformers
AMS-KV: Adaptive KV Caching in Multi-Scale Visual Autoregressive Transformers
Boxun Xu
Yu Wang
Zihu Wang
Peng Li
VLM
334
2
0
20 Nov 2025
Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning
Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning
Yuxuan Gu
Weimin Bai
Yifei Wang
Weijian Luo
H. Sun
DiffM
332
0
0
19 Nov 2025
Taming Generative Synthetic Data for X-ray Prohibited Item Detection
Taming Generative Synthetic Data for X-ray Prohibited Item Detection
Jialong Sun
Hongguang Zhu
Weizhe Liu
Yunda Sun
Renshuai Tao
Y. X. Wei
189
0
0
19 Nov 2025
SplitFlux: Learning to Decouple Content and Style from a Single Image
SplitFlux: Learning to Decouple Content and Style from a Single Image
Yitong Yang
Y Samuel Wang
Changshuo Wang
Yongjun Zhang
Ziyang Chen
Shuting He
284
2
0
19 Nov 2025
Coffee: Controllable Diffusion Fine-tuning
Coffee: Controllable Diffusion Fine-tuning
Ziyao Zeng
Jingcheng Ni
Ruyi Liu
Alex Wong
DiffM
236
1
0
18 Nov 2025
StreamingTalker: Audio-driven 3D Facial Animation with Autoregressive Diffusion Model
StreamingTalker: Audio-driven 3D Facial Animation with Autoregressive Diffusion Model
Y. Yang
Zhi Cen
Sida Peng
Xiangwei Chen
Yifu Deng
Xinyu Zhu
Fan Jia
Xiaowei Zhou
Hujun Bao
DiffMVGen
376
0
0
18 Nov 2025
DGS-Net: Distillation-Guided Gradient Surgery for CLIP Fine-Tuning in AI-Generated Image Detection
DGS-Net: Distillation-Guided Gradient Surgery for CLIP Fine-Tuning in AI-Generated Image Detection
Jiazhen Yan
Wandi Qiao
Fan Wang
Boyu Wang
Zhangjie Fu
Zhangjie Fu
273
0
0
17 Nov 2025
Infinite-Story: A Training-Free Consistent Text-to-Image Generation
Infinite-Story: A Training-Free Consistent Text-to-Image Generation
Jihun Park
Kyoungmin Lee
Jongmin Gim
Hyeonseo Jo
Minseok Oh
Wonhyeok Choi
K. Hwang
Jaeyeul Kim
Minwoo Choi
S. Im
160
1
1
17 Nov 2025
Uni-Inter: Unifying 3D Human Motion Synthesis Across Diverse Interaction Contexts
Uni-Inter: Unifying 3D Human Motion Synthesis Across Diverse Interaction Contexts
Sheng Liu
Yuanzhi Liang
Jiepeng Wang
Sidan Du
C. Zhang
Xuelong Li
236
3
0
17 Nov 2025
CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product
CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product
Kaiwen Xue
Chenglong Li
Zhonghong Ou
Guoxin Zhang
Kaoyan Lu
...
Xinyu Liu
Qunlin Chen
Weiwei Qin
Yiran Shen
Jiayi Cen
196
2
0
17 Nov 2025
ActVAR: Activating Mixtures of Weights and Tokens for Efficient Visual Autoregressive Generation
ActVAR: Activating Mixtures of Weights and Tokens for Efficient Visual Autoregressive Generation
Kaixin Zhang
Ruiqing Yang
Yuan Zhang
Shan You
Tao Huang
VLM
187
1
0
17 Nov 2025
EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis
EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis
Yijie Guo
Dexiang Hong
Weidong Chen
Zihan She
Cheng Ye
Xiaojun Chang
Zhendong Mao
163
0
0
16 Nov 2025
Point Cloud Quantization through Multimodal Prompting for 3D Understanding
Point Cloud Quantization through Multimodal Prompting for 3D Understanding
Hongxuan Li
Wencheng Zhu
Huiying Xu
Xinzhong Zhu
Q. Hu
MQ3DPC
517
0
0
15 Nov 2025
1234...727374
Next
Page 1 of 74
Pageof 74