Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2412.03255
Cited By
v1
v2
v3 (latest)
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
4 December 2024
Qu He
Jinlong Peng
P. Xu
Boyuan Jiang
Xiaobin Hu
Donghao Luo
Wenshu Fan
Yun Wang
Chengjie Wang
Xuelong Li
Jing Zhang
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation"
50 / 63 papers shown
Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning
Qingdong He
Xueqin Chen
Chaoyi Wang
Yanjie Pan
Xiaobin Hu
Zhenye Gan
Yabiao Wang
Chengjie Wang
Xiangtai Li
J. Zhang
278
6
0
02 Jul 2025
PixelPonder: Dynamic Patch Adaptation for Enhanced Multi-Conditional Text-to-Image Generation
Yanjie Pan
Qu He
Zhengkai Jiang
P. Xu
Chaoyi Wang
...
Yun Cao
Zhenye Gan
M. Chi
Bo Peng
Yun Wang
DiffM
415
5
0
09 Mar 2025
FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing
Tianshuo Yuan
Yuxiang Lin
Jue Wang
Zhi-Qi Cheng
Xiaolong Wang
Jiao GH
Wei Chen
Xiaojiang Peng
DiffM
266
4
0
22 Aug 2024
AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
Yanan Sun
Yanchen Liu
Yinhao Tang
Wenjie Pei
Kai Chen
DiffM
365
22
0
27 Jun 2024
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
Yuying Ge
Sijie Zhao
Jinguo Zhu
Yixiao Ge
Kun Yi
Lin Song
Chen Li
Xiaohan Ding
Ying Shan
VLM
518
295
0
22 Apr 2024
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Ming Li
Taojiannan Yang
Huafeng Kuang
Jie Wu
Zhaoning Wang
Xuefeng Xiao
Chong Chen
333
177
0
11 Apr 2024
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
Dewei Zhou
You Li
Fan Ma
Zongxin Yang
Yi Yang
DiffM
370
135
0
08 Feb 2024
InstanceDiffusion: Instance-level Control for Image Generation
Computer Vision and Pattern Recognition (CVPR), 2024
Xudong Wang
Trevor Darrell
Sai Saketh Rambhatla
Rohit Girdhar
Ishan Misra
VLM
DiffM
486
202
0
05 Feb 2024
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
Yuzhou Huang
Liangbin Xie
Xintao Wang
Ziyang Yuan
Xiaodong Cun
...
Jiantao Zhou
Chao Dong
Rui Huang
Ruimao Zhang
Ying Shan
DiffM
259
166
0
11 Dec 2023
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following
Shufan Li
Harkanwar Singh
Aditya Grover
DiffM
309
23
0
11 Dec 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
1.8K
1,402
0
16 Nov 2023
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
1.8K
685
0
14 Oct 2023
Improved Baselines with Visual Instruction Tuning
Computer Vision and Pattern Recognition (CVPR), 2023
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLM
MLLM
750
4,820
0
05 Oct 2023
Making LLaMA SEE and Draw with SEED Tokenizer
International Conference on Learning Representations (ICLR), 2023
Yuying Ge
Sijie Zhao
Ziyun Zeng
Yixiao Ge
Chen Li
Xintao Wang
Ying Shan
263
202
0
02 Oct 2023
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
L. Yu
Bowen Shi
Ramakanth Pasunuru
Benjamin Muller
O. Yu. Golovneva
...
Yaniv Taigman
Maryam Fazel-Zarandi
Asli Celikyilmaz
Luke Zettlemoyer
Armen Aghajanyan
MLLM
346
170
0
05 Sep 2023
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
Hu Ye
Jun Zhang
Siyi Liu
Xiao Han
Wei Yang
DiffM
449
1,487
0
13 Aug 2023
BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
IEEE International Conference on Computer Vision (ICCV), 2023
Jinheng Xie
Yuexiang Li
Yawen Huang
Haozhe Liu
Wentian Zhang
Yefeng Zheng
Mike Zheng Shou
DiffM
858
313
0
20 Jul 2023
Planting a SEED of Vision in Large Language Model
Yuying Ge
Yixiao Ge
Ziyun Zeng
Xintao Wang
Ying Shan
VLM
MLLM
378
136
0
16 Jul 2023
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
International Conference on Learning Representations (ICLR), 2023
Dustin Podell
Zion English
Kyle Lacey
A. Blattmann
Tim Dockhorn
Jonas Muller
Joe Penna
Robin Rombach
2.2K
4,427
0
04 Jul 2023
Controllable Text-to-Image Generation with GPT-4
Tianjun Zhang
Yi Zhang
Vibhav Vineet
Neel Joshi
Xin Eric Wang
DiffM
440
62
0
29 May 2023
Generating Images with Multimodal Language Models
Neural Information Processing Systems (NeurIPS), 2023
Jing Yu Koh
Daniel Fried
Ruslan Salakhutdinov
MLLM
477
365
0
26 May 2023
Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
Neural Information Processing Systems (NeurIPS), 2023
Shihao Zhao
Dongdong Chen
Yen-Chun Chen
Jianmin Bao
Shaozhe Hao
Lu Yuan
Kwan-Yee K. Wong
492
435
0
25 May 2023
UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild
Neural Information Processing Systems (NeurIPS), 2023
Can Qin
Shu Zhen Zhang
Ning Yu
Yihao Feng
Xinyi Yang
...
Caiming Xiong
Silvio Savarese
Stefano Ermon
Yun Fu
Ran Xu
531
217
0
18 May 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
International Conference on Learning Representations (ICLR), 2023
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLM
MLLM
606
3,021
0
20 Apr 2023
Visual Instruction Tuning
Neural Information Processing Systems (NeurIPS), 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
1.4K
8,828
0
17 Apr 2023
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing
IEEE International Conference on Computer Vision (ICCV), 2023
Ming Cao
Xintao Wang
Chen Ma
Ying Shan
Xiaohu Qie
Yinqiang Zheng
DiffM
316
763
0
17 Apr 2023
HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation
IEEE International Conference on Computer Vision (ICCV), 2023
Xu Ju
Ailing Zeng
Chenchen Zhao
Jianan Wang
Lei Zhang
Qian Xu
DiffM
289
133
0
09 Apr 2023
Training-Free Layout Control with Cross-Attention Guidance
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Minghao Chen
Iro Laina
Andrea Vedaldi
DiffM
553
350
0
06 Apr 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
20.2K
19,316
0
27 Feb 2023
T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
AAAI Conference on Artificial Intelligence (AAAI), 2023
Chong Mou
Xintao Wang
Liangbin Xie
Yanze Wu
Shuai Liu
Chen Ma
Ying Shan
Xiaohu Qie
DiffM
655
1,603
0
16 Feb 2023
Adding Conditional Control to Text-to-Image Diffusion Models
IEEE International Conference on Computer Vision (ICCV), 2023
Lvmin Zhang
Anyi Rao
Maneesh Agrawala
AI4CE
1.2K
6,666
1
10 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
International Conference on Machine Learning (ICML), 2023
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
1.6K
7,623
0
30 Jan 2023
GLIGEN: Open-Set Grounded Text-to-Image Generation
Computer Vision and Pattern Recognition (CVPR), 2023
Yuheng Li
Haotian Liu
Qingyang Wu
Fangzhou Mu
Jianwei Yang
Jianfeng Gao
Chunyuan Li
Yong Jae Lee
VLM
624
883
1
17 Jan 2023
ReCo: Region-Controlled Text-to-Image Generation
Computer Vision and Pattern Recognition (CVPR), 2022
Zhengyuan Yang
Jianfeng Wang
Zhe Gan
Linjie Li
Kevin Qinghong Lin
...
Nan Duan
Zicheng Liu
Ce Liu
Michael Zeng
Lijuan Wang
DiffM
347
210
0
23 Nov 2022
InstructPix2Pix: Learning to Follow Image Editing Instructions
Computer Vision and Pattern Recognition (CVPR), 2022
Tim Brooks
Aleksander Holynski
Alexei A. Efros
DiffM
1.6K
2,834
0
17 Nov 2022
Imagic: Text-Based Real Image Editing with Diffusion Models
Computer Vision and Pattern Recognition (CVPR), 2022
Bahjat Kawar
Shiran Zada
Oran Lang
Omer Tov
Hui-Tang Chang
Tali Dekel
Inbar Mosseri
Michal Irani
891
1,435
0
17 Oct 2022
LAION-5B: An open large-scale dataset for training next generation image-text models
Neural Information Processing Systems (NeurIPS), 2022
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
1.5K
4,964
0
16 Oct 2022
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
Computer Vision and Pattern Recognition (CVPR), 2022
Nataniel Ruiz
Yuanzhen Li
Varun Jampani
Yael Pritch
Michael Rubinstein
Kfir Aberman
1.5K
4,101
0
25 Aug 2022
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
International Conference on Learning Representations (ICLR), 2022
Rinon Gal
Yuval Alaluf
Yuval Atzmon
Or Patashnik
Amit H. Bermano
Gal Chechik
Daniel Cohen-Or
779
2,652
0
02 Aug 2022
Classifier-Free Diffusion Guidance
Jonathan Ho
Tim Salimans
FaML
710
5,964
0
26 Jul 2022
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2022
Reza Yazdani Aminabadi
Samyam Rajbhandari
Minjia Zhang
A. A. Awan
Cheng-rong Li
...
Elton Zheng
Jeff Rasley
Shaden Smith
Olatunji Ruwase
Yuxiong He
525
561
0
30 Jun 2022
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Neural Information Processing Systems (NeurIPS), 2022
Chitwan Saharia
William Chan
Saurabh Saxena
Lala Li
Jay Whang
...
Raphael Gontijo-Lopes
Tim Salimans
Jonathan Ho
David J Fleet
Mohammad Norouzi
VLM
1.5K
8,076
0
23 May 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
1.5K
8,816
0
13 Apr 2022
High-Resolution Image Synthesis with Latent Diffusion Models
Computer Vision and Pattern Recognition (CVPR), 2021
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
DiffM
4.8K
23,580
0
20 Dec 2021
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
International Conference on Machine Learning (ICML), 2021
Alex Nichol
Prafulla Dhariwal
Aditya A. Ramesh
Pranav Shyam
Pamela Mishkin
Bob McGrew
Ilya Sutskever
Mark Chen
1.4K
4,672
0
20 Dec 2021
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
995
1,808
0
03 Nov 2021
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
Chenlin Meng
Yutong He
Yang Song
Jiaming Song
Jiajun Wu
Jun-Yan Zhu
Stefano Ermon
DiffM
856
2,070
0
02 Aug 2021
Variational Diffusion Models
Diederik P. Kingma
Tim Salimans
Ben Poole
Jonathan Ho
DiffM
1.1K
1,448
0
01 Jul 2021
LoRA: Low-Rank Adaptation of Large Language Models
International Conference on Learning Representations (ICLR), 2021
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
1.9K
17,979
0
17 Jun 2021
Diffusion Models Beat GANs on Image Synthesis
Neural Information Processing Systems (NeurIPS), 2021
Prafulla Dhariwal
Alex Nichol
3.9K
11,425
0
11 May 2021
1
2
Next
Page 1 of 2