Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.05600
Cited By
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
8 July 2024
Zhenyu Wang
Aoxue Li
Zhenguo Li
Xihui Liu
MLLM
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing"
22 / 22 papers shown
Title
MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills
Niladri Shekhar Dutt
Duygu Ceylan
Niloy J. Mitra
DiffM
22
0
0
09 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
57
0
0
05 May 2025
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding
Qianqian Sun
Jixiang Luo
Dell Zhang
Xuelong Li
DiffM
50
0
0
17 Apr 2025
Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Human-like Audiobook Generation
Yan Rong
Shan Yang
Guangzhi Lei
Li Liu
23
0
0
15 Apr 2025
Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment
Jiayang Sun
H. Wang
Jie Cao
Huaibo Huang
R. He
DiffM
68
0
0
10 Apr 2025
POEM: Precise Object-level Editing via MLLM control
Marco Schouten
Mehmet Onurcan Kaya
Serge Belongie
Dim P. Papadopoulos
DiffM
68
0
0
10 Apr 2025
CREA: A Collaborative Multi-Agent Framework for Creative Content Generation with Diffusion Models
Kavana Venkatesh
Connor Dunlop
Pinar Yanardag
DiffM
23
0
0
07 Apr 2025
LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration
Yuyao Zhang
Jinghao Li
Yu-Wing Tai
DiffM
64
0
0
25 Mar 2025
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Arsh Koneru
Yusuke Kato
Kazuki Kozuka
Aditya Grover
VLM
56
1
0
15 Mar 2025
EmoAgent: Multi-Agent Collaboration of Plan, Edit, and Critic, for Affective Image Manipulation
Qi Mao
Haobo Hu
Yujie He
Difei Gao
Haokun Chen
Libiao Jin
DiffM
40
0
0
14 Mar 2025
MoEdit: On Learning Quantity Perception for Multi-object Image Editing
Yanfeng Li
Kahou Chan
Yue Sun
C. Lam
Tong Tong
Zitong Yu
Keren Fu
Xiaohong Liu
Tao Tan
DiffM
36
0
0
13 Mar 2025
CoSTA
∗
\ast
∗
: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing
Advait Gupta
NandaKiran Velaga
Dang Nguyen
Tianyi Zhou
DiffM
56
0
0
13 Mar 2025
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Yucheng Suo
Fan Ma
Kaixin Shen
Linchao Zhu
Yi Yang
VLM
45
0
0
12 Mar 2025
Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
Dongmin Park
Sebin Kim
Taehong Moon
Minkyu Kim
Kangwook Lee
Jaewoong Cho
DiffM
CoGe
62
2
0
08 Jan 2025
SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
Rong-Cheng Tu
Wenhao Sun
Zhao Jin
Jingyi Liao
Jiaxing Huang
Dacheng Tao
VGen
DiffM
92
3
0
28 Nov 2024
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Xinchen Zhang
Ling Yang
G. Li
Yaqi Cai
Jiake Xie
Yong Tang
Yujiu Yang
Mengdi Wang
Bin Cui
EGVM
CoGe
28
5
0
09 Oct 2024
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Yuang Peng
Yuxin Cui
Haomiao Tang
Zekun Qi
Runpei Dong
Jing Bai
Chunrui Han
Zheng Ge
Xiangyu Zhang
Shu-Tao Xia
EGVM
59
30
0
24 Jun 2024
DiffusionGPT: LLM-Driven Text-to-Image Generation System
Jie Qin
Jie Wu
Weifeng Chen
Yuxi Ren
Huixian Li
Hefeng Wu
Xuefeng Xiao
Rui Wang
S. Wen
DiffM
48
22
0
18 Jan 2024
Self-correcting LLM-controlled Diffusion Models
Tsung-Han Wu
Long Lian
Joseph E. Gonzalez
Boyi Li
Trevor Darrell
60
14
0
27 Nov 2023
LEGO-Prover: Neural Theorem Proving with Growing Libraries
Haiming Wang
Huajian Xin
Chuanyang Zheng
Lin Li
Zhengying Liu
...
Enze Xie
Jian Yin
Zhenguo Li
Heng Liao
Xiaodan Liang
LRM
39
61
0
01 Oct 2023
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
Kai Zhang
Lingbo Mo
Wenhu Chen
Huan Sun
Yu-Chuan Su
EGVM
105
235
0
16 Jun 2023
Training-Free Layout Control with Cross-Attention Guidance
Minghao Chen
Iro Laina
Andrea Vedaldi
DiffM
124
217
0
06 Apr 2023
1