ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.06125
  4. Cited By
Hierarchical Text-Conditional Image Generation with CLIP Latents

Hierarchical Text-Conditional Image Generation with CLIP Latents

13 April 2022
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
    VLM
    DiffM
ArXivPDFHTML

Papers citing "Hierarchical Text-Conditional Image Generation with CLIP Latents"

50 / 4,735 papers shown
Title
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Donghoon Kim
Minji Bae
Kyuhong Shim
B. Shim
21
0
0
13 May 2025
You Only Look One Step: Accelerating Backpropagation in Diffusion Sampling with Gradient Shortcuts
You Only Look One Step: Accelerating Backpropagation in Diffusion Sampling with Gradient Shortcuts
Hongkun Dou
Zeyu Li
Xingyu Jiang
H. Li
Lijun Yang
Wen Yao
Yue Deng
DiffM
27
0
0
12 May 2025
Pixel Motion as Universal Representation for Robot Control
Pixel Motion as Universal Representation for Robot Control
Kanchana Ranasinghe
Xiang Li
Cristina Mata
J. Park
Michael S. Ryoo
VGen
11
0
0
12 May 2025
Addressing degeneracies in latent interpolation for diffusion models
Addressing degeneracies in latent interpolation for diffusion models
Erik Landolsi
Fredrik Kahl
DiffM
29
0
0
12 May 2025
Whitened CLIP as a Likelihood Surrogate of Images and Captions
Whitened CLIP as a Likelihood Surrogate of Images and Captions
Roy Betser
Meir Yossef Levi
Guy Gilboa
11
0
0
11 May 2025
Unsupervised Learning for Class Distribution Mismatch
Unsupervised Learning for Class Distribution Mismatch
Pan Du
Wangbo Zhao
Xinai Lu
Nian Liu
Z. Li
...
Suyun Zhao
H. Chen
Cuiping Li
Kai Wang
Yang You
11
0
0
11 May 2025
HCMA: Hierarchical Cross-model Alignment for Grounded Text-to-Image Generation
HCMA: Hierarchical Cross-model Alignment for Grounded Text-to-Image Generation
Hang Wang
Zhi-Qi Cheng
Chenhao Lin
Chao Shen
Lei Zhang
DiffM
32
0
0
10 May 2025
ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images
ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images
Xianghao Kong
Qiaosong Qi
Yuanbin Wang
Anyi Rao
Biaolong Chen
Aixi Zhang
Si Liu
Hao Jiang
DiffM
VGen
20
0
0
10 May 2025
Learning Graph Representation of Agent Diffuser
Learning Graph Representation of Agent Diffuser
Youcef Djenouri
Nassim Belmecheri
Tomasz Michalak
Jan Dubiñski
Ahmed Nabil Belbachir
Anis Yazidi
AI4CE
21
0
0
10 May 2025
Demystifying Diffusion Policies: Action Memorization and Simple Lookup Table Alternatives
Demystifying Diffusion Policies: Action Memorization and Simple Lookup Table Alternatives
Chengyang He
Xu Liu
Gadiel Sznaier Camps
Guillaume Sartoretti
Mac Schwager
23
0
0
09 May 2025
Automated Learning of Semantic Embedding Representations for Diffusion Models
Automated Learning of Semantic Embedding Representations for Diffusion Models
Limai Jiang
Yunpeng Cai
DiffM
23
0
0
09 May 2025
MAGE:A Multi-stage Avatar Generator with Sparse Observations
MAGE:A Multi-stage Avatar Generator with Sparse Observations
Fangyu Du
Yang Yang
Xuehao Gao
Hongye Hou
VGen
18
0
0
09 May 2025
FLAM: Frame-Wise Language-Audio Modeling
FLAM: Frame-Wise Language-Audio Modeling
Yusong Wu
Christos Tsirigotis
Ke Chen
Cheng-Zhi Anna Huang
Aaron C. Courville
Oriol Nieto
Prem Seetharaman
Justin Salamon
43
0
0
08 May 2025
FG-CLIP: Fine-Grained Visual and Textual Alignment
FG-CLIP: Fine-Grained Visual and Textual Alignment
Chunyu Xie
Bin Wang
Fanjing Kong
Jincheng Li
Dawei Liang
Gengshen Zhang
Dawei Leng
Yuhui Yin
CLIP
VLM
39
0
0
08 May 2025
Denoising Diffusion Probabilistic Models for Coastal Inundation Forecasting
Denoising Diffusion Probabilistic Models for Coastal Inundation Forecasting
Kazi Ashik Islam
Zakaria Mehrab
Mahantesh Halappanavar
H. Mortveit
Sridhar Katragadda
Jon Derek Loftis
Madhav V. Marathe
DiffM
AI4CE
37
0
0
08 May 2025
Position: Epistemic Artificial Intelligence is Essential for Machine Learning Models to Know When They Do Not Know
Position: Epistemic Artificial Intelligence is Essential for Machine Learning Models to Know When They Do Not Know
Shireen Kudukkil Manchingal
Fabio Cuzzolin
42
0
0
08 May 2025
ItDPDM: Information-Theoretic Discrete Poisson Diffusion Model
ItDPDM: Information-Theoretic Discrete Poisson Diffusion Model
Sagnik Bhattacharya
Abhiram Gorle
Ahmed Mohsin
Ahsan Bilal
Connor Ding
Amit Kumar Singh Yadav
Tsachy Weissman
DiffM
40
0
0
08 May 2025
PIDiff: Image Customization for Personalized Identities with Diffusion Models
PIDiff: Image Customization for Personalized Identities with Diffusion Models
Jinyu Gu
Haipeng Liu
M. Y. Wang
Y. Wang
48
0
0
08 May 2025
Prompt to Polyp: Medical Text-Conditioned Image Synthesis with Diffusion Models
Prompt to Polyp: Medical Text-Conditioned Image Synthesis with Diffusion Models
Mikhail Chaichuk
Sushant Gautam
Steven A. Hicks
Elena Tutubalina
DiffM
MedIm
43
0
0
08 May 2025
Flow-GRPO: Training Flow Matching Models via Online RL
Flow-GRPO: Training Flow Matching Models via Online RL
Jie Liu
Gongye Liu
Jiajun Liang
Y. Li
Jiaheng Liu
X. Wang
Pengfei Wan
Di Zhang
Wanli Ouyang
AI4CE
66
0
0
08 May 2025
MDE-Edit: Masked Dual-Editing for Multi-Object Image Editing via Diffusion Models
MDE-Edit: Masked Dual-Editing for Multi-Object Image Editing via Diffusion Models
Hongyang Zhu
Haipeng Liu
Bo Fu
Yang Wang
DiffM
28
0
0
08 May 2025
ELGAR: Expressive Cello Performance Motion Generation for Audio Rendition
ELGAR: Expressive Cello Performance Motion Generation for Audio Rendition
Zhiping Qiu
Yitong Jin
Y. Wang
Yi Shi
C. Wang
Chao Tan
Xiaobing Li
Feng Yu
Tao Yu
Qionghai Dai
21
0
0
07 May 2025
FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing
FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing
Rui Lan
Y. Bai
Xu Duan
M. Li
Lei Sun
X. Chu
DiffM
44
0
0
06 May 2025
Deepfakes on Demand: the rise of accessible non-consensual deepfake image generators
Deepfakes on Demand: the rise of accessible non-consensual deepfake image generators
Will Hawkins
Chris Russell
Brent Mittelstadt
DiffM
28
0
0
06 May 2025
Robustness in AI-Generated Detection: Enhancing Resistance to Adversarial Attacks
Robustness in AI-Generated Detection: Enhancing Resistance to Adversarial Attacks
Sun Haoxuan
Hong Yan
Zhan Jiahui
Chen Haoxing
Lan Jun
Zhu Huijia
Wang Weiqiang
Zhang Liqing
Zhang Jianfu
AAML
40
0
0
06 May 2025
PiCo: Enhancing Text-Image Alignment with Improved Noise Selection and Precise Mask Control in Diffusion Models
PiCo: Enhancing Text-Image Alignment with Improved Noise Selection and Precise Mask Control in Diffusion Models
Chang Xie
Chenyi Zhuang
Pan Gao
VLM
22
0
0
06 May 2025
Distribution-Conditional Generation: From Class Distribution to Creative Generation
Distribution-Conditional Generation: From Class Distribution to Creative Generation
Fu Feng
Yucheng Xie
Xu Yang
Jing Wang
Xin Geng
DiffM
29
0
0
06 May 2025
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Biao Gong
Cheng Zou
Dandan Zheng
Hu Yu
Jingdong Chen
...
Qingpei Guo
Rui Liu
Weilong Chai
Xinyu Xiao
Ziyuan Huang
MLLM
74
1
0
05 May 2025
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
Ming Li
Xin Gu
Fan Chen
X. Xing
Longyin Wen
C. L. P. Chen
Sijie Zhu
DiffM
71
1
0
05 May 2025
Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models
Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models
Kuofeng Gao
Yufei Zhu
Yiming Li
Jiawang Bai
Yong-Liang Yang
Z. Li
Shu-Tao Xia
34
0
0
05 May 2025
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
Mingcheng Li
Xiaolu Hou
Ziyang Liu
Dingkang Yang
Ziyun Qian
Jiawei Chen
Jinjie Wei
Y. Jiang
Qingyao Xu
L. Zhang
DiffM
44
0
0
05 May 2025
Efficient Multi Subject Visual Reconstruction from fMRI Using Aligned Representations
Efficient Multi Subject Visual Reconstruction from fMRI Using Aligned Representations
Christos Zangos
Danish Ebadulla
Thomas C. Sprague
Ambuj Singh
46
0
0
03 May 2025
Any-to-Any Vision-Language Model for Multimodal X-ray Imaging and Radiological Report Generation
Any-to-Any Vision-Language Model for Multimodal X-ray Imaging and Radiological Report Generation
Daniele Molino
Francesco Di Feola
Linlin Shen
Paolo Soda
V. Guarrasi
MedIm
LM&MA
57
0
0
02 May 2025
Provable Efficiency of Guidance in Diffusion Models for General Data Distribution
Provable Efficiency of Guidance in Diffusion Models for General Data Distribution
Gen Li
Yuchen Jiao
44
0
0
02 May 2025
Where's the liability in the Generative Era? Recovery-based Black-Box Detection of AI-Generated Content
Where's the liability in the Generative Era? Recovery-based Black-Box Detection of AI-Generated Content
Haoyue Bai
Yiyou Sun
Wei Cheng
Haifeng Chen
AAML
46
0
0
02 May 2025
InstructAttribute: Fine-grained Object Attributes editing with Instruction
InstructAttribute: Fine-grained Object Attributes editing with Instruction
Xingxi Yin
Jingfeng Zhang
Zhi Li
Y. Li
Y. Zhang
DiffM
73
0
0
01 May 2025
CoCoDiff: Diversifying Skeleton Action Features via Coarse-Fine Text-Co-Guided Latent Diffusion
CoCoDiff: Diversifying Skeleton Action Features via Coarse-Fine Text-Co-Guided Latent Diffusion
Zhifu Zhao
Hanyang Hua
J. Li
Shaoxin Wu
Fu Li
Yangtao Zhou
Yang Li
DiffM
68
0
0
30 Apr 2025
The Dual Power of Interpretable Token Embeddings: Jailbreaking Attacks and Defenses for Diffusion Model Unlearning
The Dual Power of Interpretable Token Embeddings: Jailbreaking Attacks and Defenses for Diffusion Model Unlearning
Siyi Chen
Yimeng Zhang
Sijia Liu
Q. Qu
AAML
55
0
0
30 Apr 2025
Capturing Conditional Dependence via Auto-regressive Diffusion Models
Capturing Conditional Dependence via Auto-regressive Diffusion Models
Xunpeng Huang
Yujin Han
Difan Zou
Yian Ma
Tong Zhang
DiffM
54
0
0
30 Apr 2025
AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images
AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images
Yunhao Li
Sijing Wu
Wei Sun
Zhichao Zhang
Yucheng Zhu
Zicheng Zhang
Huiyu Duan
Xiongkuo Min
Guangtao Zhai
EGVM
78
0
0
30 Apr 2025
Partitioned Memory Storage Inspired Few-Shot Class-Incremental learning
Partitioned Memory Storage Inspired Few-Shot Class-Incremental learning
Renye Zhang
Yimin Yin
Jinghua Zhang
CLL
47
0
0
29 Apr 2025
Evaluating Generative Models for Tabular Data: Novel Metrics and Benchmarking
Evaluating Generative Models for Tabular Data: Novel Metrics and Benchmarking
Dayananda Herurkar
Ahmad Ali
Andreas Dengel
38
0
0
29 Apr 2025
Efficient Listener: Dyadic Facial Motion Synthesis via Action Diffusion
Efficient Listener: Dyadic Facial Motion Synthesis via Action Diffusion
Z. Wang
Alexandre Bruckert
P. Le Callet
Guangtao Zhai
VGen
32
0
0
29 Apr 2025
SynergyAmodal: Deocclude Anything with Text Control
SynergyAmodal: Deocclude Anything with Text Control
Xinyang Li
Chengjie Yi
Jiawei Lai
Mingbao Lin
Yansong Qu
Shengchuan Zhang
Liujuan Cao
DiffM
73
0
0
28 Apr 2025
Open-set Anomaly Segmentation in Complex Scenarios
Open-set Anomaly Segmentation in Complex Scenarios
Song Xia
Yi Yu
Henghui Ding
Wenhan Yang
S. Liu
Alex C. Kot
Xudong Jiang
DiffM
50
0
0
28 Apr 2025
Masked Language Prompting for Generative Data Augmentation in Few-shot Fashion Style Recognition
Masked Language Prompting for Generative Data Augmentation in Few-shot Fashion Style Recognition
Yuki Hirakawa
Ryotaro Shimizu
41
0
0
28 Apr 2025
EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation
EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation
Zhe Dong
Yuzhe Sun
Tianzhu Liu
Wangmeng Zuo
Yanfeng Gu
48
0
0
28 Apr 2025
CapsFake: A Multimodal Capsule Network for Detecting Instruction-Guided Deepfakes
CapsFake: A Multimodal Capsule Network for Detecting Instruction-Guided Deepfakes
Tuan Nguyen
Naseem Khan
Issa Khalil
AAML
52
0
0
27 Apr 2025
REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models
REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models
Gal Almog
Ariel Shamir
Ohad Fried
DiffM
50
0
0
26 Apr 2025
Optimizing Multi-Round Enhanced Training in Diffusion Models for Improved Preference Understanding
Optimizing Multi-Round Enhanced Training in Diffusion Models for Improved Preference Understanding
Kun Li
J. Wang
Yangfan He
Xinyuan Song
Ruoyu Wang
...
K. Li
Sida Li
Miao Zhang
Tianyu Shi
Xueqian Wang
40
0
0
25 Apr 2025
1234...939495
Next