Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.08583
Cited By
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance
18 April 2022
Katherine Crowson
Stella Biderman
Daniel Kornis
Dashiell Stander
Eric Hallahan
Louis Castricato
Edward Raff
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance"
50 / 255 papers shown
Title
CoCoDiff: Diversifying Skeleton Action Features via Coarse-Fine Text-Co-Guided Latent Diffusion
Zhifu Zhao
Hanyang Hua
J. Li
Shaoxin Wu
Fu Li
Yangtao Zhou
Yang Li
DiffM
68
0
0
30 Apr 2025
EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation
Zhe Dong
Yuzhe Sun
Tianzhu Liu
Wangmeng Zuo
Yanfeng Gu
48
0
0
28 Apr 2025
StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians
Cailin Zhuang
Yaoqi Hu
X. Zhang
Wei Cheng
Jiacheng Bao
Shengqi Liu
Yiying Yang
Xianfang Zeng
Gang Yu
Ming Li
3DGS
31
0
0
21 Apr 2025
DA2Diff: Exploring Degradation-aware Adaptive Diffusion Priors for All-in-One Weather Restoration
Jiamei Xiong
Xuefeng Yan
Yongzhen Wang
Wei Zhao
Xiao-Ping Zhang
Mingqiang Wei
DiffM
24
0
0
07 Apr 2025
InstructVEdit: A Holistic Approach for Instructional Video Editing
Chi Zhang
C. Feng
Feng Yan
Qiming Zhang
Mingjin Zhang
Yujie Zhong
Jing Zhang
Lin Ma
DiffM
VGen
34
0
0
22 Mar 2025
Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification
Dongseob Kim
Hyunjung Shim
VLM
39
0
0
21 Mar 2025
Learning to Generalize without Bias for Open-Vocabulary Action Recognition
Yating Yu
Congqi Cao
Yifan Zhang
Yanning Zhang
VLM
38
0
0
27 Feb 2025
Consistent estimation of generative model representations in the data kernel perspective space
Aranyak Acharyya
M. Trosset
Carey E. Priebe
Hayden Helm
DiffM
46
3
0
20 Jan 2025
Concept Matching with Agent for Out-of-Distribution Detection
YuXiao Lee
Xiaofeng Cao
Jingcai Guo
Wei Ye
Qing-Wu Guo
Yi Chang
40
0
0
08 Jan 2025
Can video generation replace cinematographers? Research on the cinematic language of generated video
X. Li
Kai WU
Siyi Yang
YiZhan Qu
Guohua. Zhang
...
Mingliang Xiong
Hao Deng
Qingwen Liu
Gang Li
Bin He
VGen
DiffM
81
1
0
16 Dec 2024
Relations, Negations, and Numbers: Looking for Logic in Generative Text-to-Image Models
C. Conwell
Rupert Tawiah-Quashie
T. Ullman
66
0
0
26 Nov 2024
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
Qifan Yu
Wei Chow
Zhongqi Yue
Kaihang Pan
Yang Wu
Xiaoyang Wan
Juncheng Billy Li
Siliang Tang
H. Zhang
Yueting Zhuang
DiffM
95
15
0
24 Nov 2024
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
Cong Wei
Zheyang Xiong
Weiming Ren
Xinrun Du
Ge Zhang
Wenhu Chen
84
18
0
11 Nov 2024
Autoregressive Models in Vision: A Survey
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
...
M. Zhang
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
43
9
0
08 Nov 2024
Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics
Jinghao Hu
Yuhe Zhang
Guohua Geng
Liuyuxin Yang
JiaRui Yan
Jingtao Cheng
YaDong Zhang
Kang Li
DiffM
24
0
0
24 Oct 2024
Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models
Makram Chahine
Alex Quach
Alaa Maalouf
T. Wang
Daniela Rus
18
0
0
16 Oct 2024
PixLens: A Novel Framework for Disentangled Evaluation in Diffusion-Based Image Editing with Object Detection + SAM
Stefan Stefanache
Lluís Pastor Pérez
Julen Costa Watanabe
Ernesto Sanchez Tejedor
Thomas Hofmann
Enis Simsar
EGVM
13
0
0
08 Oct 2024
VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models
Jiapeng Wang
Chengyu Wang
Kunzhe Huang
Jun Huang
Lianwen Jin
CLIP
VLM
14
2
0
01 Oct 2024
CusConcept: Customized Visual Concept Decomposition with Diffusion Models
Zhi Xu
Shaozhe Hao
Kai Han
DiffM
20
4
0
01 Oct 2024
DAP-LED: Learning Degradation-Aware Priors with CLIP for Joint Low-light Enhancement and Deblurring
Ling Wang
Chen Wu
Lin Wang
24
0
0
20 Sep 2024
Mixture of Prompt Learning for Vision Language Models
Yu Du
Tong Niu
Rong Zhao
VLM
16
0
0
18 Sep 2024
InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models
Yan Zheng
Lemeng Wu
DiffM
MDE
13
0
0
18 Sep 2024
360PanT: Training-Free Text-Driven 360-Degree Panorama-to-Panorama Translation
Hai Wang
Jing-Hao Xue
25
0
0
12 Sep 2024
Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action Localization
Geuntaek Lim
Hyunwoo Kim
Joonsoo Kim
Yukyung Choi
15
0
0
12 Aug 2024
Dataset Scale and Societal Consistency Mediate Facial Impression Bias in Vision-Language AI
Robert Wolfe
Aayushi Dangol
Alexis Hiniker
Bill Howe
18
0
0
04 Aug 2024
FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation
Xiang Gao
Jiaying Liu
35
2
0
02 Aug 2024
Few-shot Defect Image Generation based on Consistency Modeling
Qingfeng Shi
Jing Wei
Fei Shen
Zheng Zhang
24
2
0
01 Aug 2024
Diffusion Feedback Helps CLIP See Better
Wenxuan Wang
Quan-Sen Sun
Fan Zhang
Yepeng Tang
Jing Liu
Xinlong Wang
VLM
32
6
0
29 Jul 2024
Text2LiDAR: Text-guided LiDAR Point Cloud Generation via Equirectangular Transformer
Yang Wu
Kaihua Zhang
Jianjun Qian
Jin Xie
Jian Yang
DiffM
24
4
0
29 Jul 2024
Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification
Yunyi Xuan
Weijie Chen
Shicai Yang
Di Xie
Luojun Lin
Yueting Zhuang
VLM
12
4
0
21 Jul 2024
Training-Free Large Model Priors for Multiple-in-One Image Restoration
Xuanhua He
Lang Li
Yingying Wang
Hui Zheng
Ke Cao
K. Yan
Rui Li
Chengjun Xie
Jie Zhang
Man Zhou
DiffM
34
0
0
18 Jul 2024
ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
Shaozhe Hao
Kai Han
Zhengyao Lv
Shihao Zhao
Kwan-Yee K. Wong
DiffM
CoGe
27
0
0
09 Jul 2024
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
Haozhe Zhao
Xiaojian Ma
Liang Chen
Shuzheng Si
Rujie Wu
Kaikai An
Peiyu Yu
Minjia Zhang
Qing Li
Baobao Chang
24
2
0
07 Jul 2024
Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation
Xiang Gao
Zhengbo Xu
Junhan Zhao
Jiaying Liu
DiffM
21
8
0
03 Jul 2024
FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources
Xiyuan Wei
Fanjiang Ye
Ori Yonay
Xingyu Chen
Baixi Sun
Dingwen Tao
Tianbao Yang
VLM
CLIP
35
0
0
01 Jul 2024
Fairness and Bias in Multimodal AI: A Survey
Tosin P. Adewumi
Lama Alkhaled
Namrata Gurung
G. V. Boven
Irene Pagliai
43
9
0
27 Jun 2024
On Discrete Prompt Optimization for Diffusion Models
Ruochen Wang
Ting Liu
Cho-Jui Hsieh
Boqing Gong
DiffM
18
1
0
27 Jun 2024
Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation
Youngmin Kim
Saejin Kim
Hoyeon Moon
Youngjae Yu
Junhyug Noh
MedIm
21
0
0
25 Jun 2024
TP-DRSeg: Improving Diabetic Retinopathy Lesion Segmentation with Explicit Text-Prompts Assisted SAM
Wenxue Li
Xinyu Xiong
Peng Xia
Lie Ju
Zongyuan Ge
MedIm
23
9
0
22 Jun 2024
AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation
Lianyu Pang
Jian Yin
Baoquan Zhao
Feize Wu
Fu Lee Wang
Qing Li
Xudong Mao
DiffM
19
1
0
07 Jun 2024
Creative Text-to-Audio Generation via Synthesizer Programming
Manuel Cherep
Nikhil Singh
Jessica Shand
23
1
0
01 Jun 2024
Topological Perspectives on Optimal Multimodal Embedding Spaces
Abdul Aziz
Abdul Rahim
BDL
21
0
0
29 May 2024
WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization
Jiawei Ma
Yulei Niu
Shiyuan Huang
G. Han
Shih-Fu Chang
VLM
21
1
0
28 May 2024
PromptFix: You Prompt and We Fix the Photo
Yongsheng Yu
Ziyun Zeng
Hang Hua
Jianlong Fu
Jiebo Luo
MLLM
DiffM
VLM
24
3
0
27 May 2024
EditWorld: Simulating World Dynamics for Instruction-Following Image Editing
Ling Yang
Bo-Wen Zeng
Jiaming Liu
Hong Li
Minghao Xu
Wentao Zhang
Shuicheng Yan
DiffM
21
9
0
23 May 2024
ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing
Ying Jin
Pengyang Ling
Xiao-wen Dong
Pan Zhang
Jiaqi Wang
Dahua Lin
24
2
0
18 May 2024
Who's in and who's out? A case study of multimodal CLIP-filtering in DataComp
Rachel Hong
William Agnew
Tadayoshi Kohno
Jamie Morgenstern
17
3
0
13 May 2024
Probing Multimodal LLMs as World Models for Driving
Shiva Sreeram
T. Wang
Alaa Maalouf
Guy Rosman
S. Karaman
Daniela Rus
20
7
0
09 May 2024
Dual-Modal Prompting for Sketch-Based Image Retrieval
Liying Gao
Bingliang Jiao
Peng Wang
Shizhou Zhang
Hanwang Zhang
Yanning Zhang
VLM
44
0
0
29 Apr 2024
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Navve Wasserman
Noam Rotstein
Roy Ganz
Ron Kimmel
DiffM
23
14
0
28 Apr 2024
1
2
3
4
5
6
Next