Hierarchical Text-Conditional Image Generation with CLIP Latents

13 April 2022

Papers citing "Hierarchical Text-Conditional Image Generation with CLIP Latents"

50 / 4,735 papers shown

Title
Generated Faces in the Wild: Quantitative Comparison of Stable Diffusion, Midjourney and DALL-E 2 Ali Borji DiffM 75 117 0 02 Oct 2022
ManiCLIP: Multi-Attribute Face Manipulation from Text Hao Wang Guosheng Lin A. Molino Anran Wang Jiashi Feng Zehuan Yuan CVBM 33 9 0 02 Oct 2022
Ten Years after ImageNet: A 360° Perspective on AI S. Chawla Preslav Nakov Ahmed Ali Wendy Hall Issa M. Khalil Xiaosong Ma H. Sencar Ingmar Weber Michael Wooldridge Tingyue Yu 18 0 0 01 Oct 2022
Equivariant Energy-Guided SDE for Inverse Molecular Design Fan Bao Min Zhao Zhongkai Hao Pei‐Yun Li Chongxuan Li Jun Zhu DiffM 170 62 0 30 Sep 2022
AudioGen: Textually Guided Audio Generation Felix Kreuk Gabriel Synnaeve Adam Polyak Uriel Singer Alexandre Défossez Jade Copet Devi Parikh Yaniv Taigman Yossi Adi DiffM 17 288 0 30 Sep 2022
Data Poisoning Attacks Against Multimodal Encoders Ziqing Yang Xinlei He Zheng Li Michael Backes Mathias Humbert Pascal Berrang Yang Zhang AAML 106 43 0 30 Sep 2022
Diffusion-based Image Translation using Disentangled Style and Content Representation Gihyun Kwon Jong Chul Ye DiffM 147 149 0 30 Sep 2022
Mind Reader: Reconstructing complex images from brain activities Sikun Lin Thomas C. Sprague Ambuj K. Singh DiffM 116 86 0 30 Sep 2022
Understanding Pure CLIP Guidance for Voxel Grid NeRF Models Han-Hung Lee Angel X. Chang 14 63 0 30 Sep 2022
State-specific protein-ligand complex structure prediction with a multi-scale deep generative model Zhuoran Qiao Weili Nie Arash Vahdat Thomas F. Miller Anima Anandkumar DiffM 23 84 0 30 Sep 2022
DreamFusion: Text-to-3D using 2D Diffusion Ben Poole Ajay Jain Jonathan T. Barron B. Mildenhall 26 2,302 0 29 Sep 2022
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding Yanmin Wu Xinhua Cheng Renrui Zhang Zesen Cheng Jian Zhang 48 62 0 29 Sep 2022
Spotlight: Mobile UI Understanding using Vision-Language Models with a Focus Gang Li Yang Li 14 63 0 29 Sep 2022
Human Motion Diffusion Model Guy Tevet Sigal Raab Brian Gordon Yonatan Shafir Daniel Cohen-Or Amit H. Bermano DiffM VGen 188 713 0 29 Sep 2022
Analyzing Diffusion as Serial Reproduction Raja Marjieh Ilia Sucholutsky Thomas A. Langlois Nori Jacoby Thomas L. Griffiths DiffM 33 4 0 29 Sep 2022
Make-A-Video: Text-to-Video Generation without Text-Video Data Uriel Singer Adam Polyak Thomas Hayes Xiaoyue Yin Jie An ... Oron Ashual Oran Gafni Devi Parikh Sonal Gupta Yaniv Taigman DiffM VGen 22 1,339 0 29 Sep 2022
Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling Huayu Chen Cheng Lu Chengyang Ying Hang Su Jun Zhu DiffM OffRL 85 103 0 29 Sep 2022
Re-Imagen: Retrieval-Augmented Text-to-Image Generator Wenhu Chen Hexiang Hu Chitwan Saharia William W. Cohen VLM 114 159 0 29 Sep 2022
Compositional Score Modeling for Simulation-based Inference Tomas Geffner George Papamakarios A. Mnih 60 24 0 28 Sep 2022
What Does DALL-E 2 Know About Radiology? Lisa Christine Adams Felix Busch Daniel Truhn Marcus R. Makowski Hugo J. W. L. Aerts Keno K. Bressem MedIm 34 57 0 27 Sep 2022
Draw Your Art Dream: Diverse Digital Art Synthesis with Multimodal Guided Diffusion Nisha Huang Fan Tang Weiming Dong Changsheng Xu DiffM 58 40 0 27 Sep 2022
Learning to Learn with Generative Models of Neural Network Checkpoints William S. Peebles Ilija Radosavovic Tim Brooks Alexei A. Efros Jitendra Malik UQCV 73 64 0 26 Sep 2022
Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts Joel Jang Seonghyeon Ye Minjoon Seo ELM LRM 87 64 0 26 Sep 2022
A Collaborative, Interactive and Context-Aware Drawing Agent for Co-Creative Design F. Ibarrola Tomas Lawton Kazjon Grace 30 12 0 26 Sep 2022
Convergence of score-based generative modeling for general data distributions Holden Lee Jianfeng Lu Yixin Tan DiffM 177 128 0 26 Sep 2022
All are Worth Words: A ViT Backbone for Diffusion Models Fan Bao Shen Nie Kaiwen Xue Yue Cao Chongxuan Li Hang Su Jun Zhu VLM 9 312 0 25 Sep 2022
Best Prompts for Text-to-Image Models and How to Find Them Nikita Pavlichenko Dmitry Ustalov DiffM 12 58 0 23 Sep 2022
Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions Sitan Chen Sinho Chewi Jungshian Li Yuanzhi Li Adil Salim Anru R. Zhang DiffM 123 245 0 22 Sep 2022
MIDMs: Matching Interleaved Diffusion Models for Exemplar-based Image Translation Junyoung Seo Gyuseong Lee Seokju Cho Jiyoung Lee Seung Wook Kim DiffM 21 27 0 22 Sep 2022
Implementing and Experimenting with Diffusion Models for Text-to-Image Generation Robin Zbinden 17 3 0 22 Sep 2022
Text2Light: Zero-Shot Text-Driven HDR Panorama Generation Zhaoxi Chen Guangcong Wang Ziwei Liu 88 30 0 20 Sep 2022
Extremely Simple Activation Shaping for Out-of-Distribution Detection Andrija Djurisic Nebojsa Bozanic Arjun Ashok Rosanne Liu OODD 152 146 0 20 Sep 2022
GAMA: Generative Adversarial Multi-Object Scene Attacks Abhishek Aich Calvin-Khang Ta Akash Gupta Chengyu Song S. Krishnamurthy M. Salman Asif A. Roy-Chowdhury AAML 36 17 0 20 Sep 2022
MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation Chuanxia Zheng L. Vuong Jianfei Cai Dinh Q. Phung MQ 58 72 0 19 Sep 2022
Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis Lukas Struppek Dominik Hintersdorf Felix Friedrich Manuel Brack P. Schramowski Kristian Kersting 66 24 0 19 Sep 2022
Distribution Aware Metrics for Conditional Natural Language Generation David M. Chan Yiming Ni David A. Ross Sudheendra Vijayanarasimhan Austin Myers John F. Canny 35 4 0 15 Sep 2022
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models Manli Shu Weili Nie De-An Huang Zhiding Yu Tom Goldstein Anima Anandkumar Chaowei Xiao VLM VPVLM 175 278 0 15 Sep 2022
Does CLIP Know My Face? Dominik Hintersdorf Lukas Struppek Manuel Brack Felix Friedrich P. Schramowski Kristian Kersting VLM 13 9 0 15 Sep 2022
Brain Imaging Generation with Latent Diffusion Models W. H. Pinaya Petru-Daniel Tudosiu J. Dafflon P. F. D. Costa Virginia Fernandez P. Nachev Sebastien Ourselin M. Jorge Cardoso DiffM MedIm 87 275 0 15 Sep 2022
M^4I: Multi-modal Models Membership Inference Pingyi Hu Zihan Wang Ruoxi Sun Hu Wang Minhui Xue 29 25 0 15 Sep 2022
Lossy Image Compression with Conditional Diffusion Models Ruihan Yang Stephan Mandt DiffM 9 123 0 14 Sep 2022
Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans John J. Nay ELM AILaw 84 26 0 14 Sep 2022
Soft Diffusion: Score Matching for General Corruptions Giannis Daras M. Delbracio Hossein Talebi A. Dimakis P. Milanfar DiffM 55 105 0 12 Sep 2022
Diffusion Models in Vision: A Survey Florinel-Alin Croitoru Vlad Hondru Radu Tudor Ionescu M. Shah DiffM VLM MedIm 188 1,098 0 10 Sep 2022
ISS: Image as Stepping Stone for Text-Guided 3D Shape Generation Zhengzhe Liu Peng Dai Ruihui Li Xiaojuan Qi Chi-Wing Fu DiffM 173 25 0 09 Sep 2022
TEACH: Temporal Action Composition for 3D Humans Nikos Athanasiou Mathis Petrovich Michael J. Black Gül Varol 78 138 0 09 Sep 2022
Dr. Neurosymbolic, or: How I Learned to Stop Worrying and Accept Statistics Masataro Asai 12 0 0 08 Sep 2022
Text-Free Learning of a Natural Language Interface for Pretrained Face Generators Xiaodan Du Raymond A. Yeh Nicholas I. Kolkin Eli Shechtman Gregory Shakhnarovich CLIP 16 1 0 08 Sep 2022
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow Xingchao Liu Chengyue Gong Qiang Liu OOD 20 822 0 07 Sep 2022
Statistical Foundation Behind Machine Learning and Its Impact on Computer Vision Lei Zhang H. Shum VLM SSL 8 2 0 06 Sep 2022