Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2211.09800
Cited By
v1
v2 (latest)
InstructPix2Pix: Learning to Follow Image Editing Instructions
Computer Vision and Pattern Recognition (CVPR), 2022
17 November 2022
Tim Brooks
Aleksander Holynski
Alexei A. Efros
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (4 upvotes)
Papers citing
"InstructPix2Pix: Learning to Follow Image Editing Instructions"
50 / 1,731 papers shown
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
Xin Luo
Jiahao Wang
Chenyuan Wu
Shitao Xiao
Xiyan Jiang
Defu Lian
Jiajun Zhang
Dong Liu
Zheng Liu
OffRL
197
13
0
28 Sep 2025
LABELING COPILOT: A Deep Research Agent for Automated Data Curation in Computer Vision
Debargha Ganguly
Sumit Kumar
Ishwar B Balappanawar
Weicong Chen
Shashank Kambhatla
Srinivasan Iyengar
Shivkumar Kalyanaraman
Ponnurangam Kumaraguru
Vipin Chaudhary
VLM
186
0
0
26 Sep 2025
Training-Free Synthetic Data Generation with Dual IP-Adapter Guidance
Luc Boudier
Loris Manganelli
Eleftherios Tsonis
Nicolas Dufour
Vicky Kalogeiton
DiffM
VLM
113
1
0
26 Sep 2025
SAGE: Scene Graph-Aware Guidance and Execution for Long-Horizon Manipulation Tasks
Jialiang Li
Wenzheng Wu
Gaojing Zhang
Yifan Han
Wenzhao Lian
LM&Ro
132
0
0
26 Sep 2025
FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing
Junyi Wu
Zhiteng Li
Haotong Qin
Xiaohong Liu
Linghe Kong
Yulun Zhang
Xiaokang Yang
DiffM
244
0
0
26 Sep 2025
TDEdit: A Unified Diffusion Framework for Text-Drag Guided Image Manipulation
Qihang Wang
Yaxiong Wang
Lechao Cheng
Zhun Zhong
DiffM
104
0
0
26 Sep 2025
Does FLUX Already Know How to Perform Physically Plausible Image Composition?
Shilin Lu
Zhuming Lian
Zihan Zhou
Shaocong Zhang
Chen Zhao
A. Kong
314
11
0
25 Sep 2025
UniTransfer: Video Concept Transfer via Progressive Spatial and Timestep Decomposition
Guojun Lei
Rong Zhang
Chi-Yin Wang
Tianhang Liu
Hong Li
Zhiyuan Ma
W. Xu
VGen
154
0
0
25 Sep 2025
Guiding Audio Editing with Audio Language Model
Zitong Lan
Yiduo Hao
Mingmin Zhao
DiffM
KELM
169
4
0
25 Sep 2025
Evaluating the Evaluators: Metrics for Compositional Text-to-Image Generation
S. Kasaei
Ali Aghayari
Arash Marioriyad
Niki Sepasian
MohammadAmin Fazli
Mahdieh Soleymani Baghshah
M. Rohban
EGVM
247
0
0
25 Sep 2025
CAMILA: Context-Aware Masking for Image Editing with Language Alignment
Hyunseung Kim
Chiho Choi
Srikanth Malla
Sai Prahladh Padmanabhan
Saurabh Bagchi
Joon Hee Choi
289
0
0
24 Sep 2025
Unleashing the Potential of the Semantic Latent Space in Diffusion Models for Image Dehazing
European Conference on Computer Vision (ECCV), 2025
Zizheng Yang
Hu Yu
Bing Li
Jinghao Zhang
Jie Huang
Feng Zhao
222
10
0
24 Sep 2025
Towards Application Aligned Synthetic Surgical Image Synthesis
Danush Kumar Venkatesh
Stefanie Speidel
MedIm
128
0
0
23 Sep 2025
One-shot Embroidery Customization via Contrastive LoRA Modulation
Jun Ma
Qian He
Gaofeng He
Huang Chen
Chen Liu
Xiaogang Jin
Huamin Wang
DiffM
194
0
0
23 Sep 2025
Prompt-Guided Dual Latent Steering for Inversion Problems
Yichen Wu
Xu Liu
Chenxuan Zhao
Xinyu Wu
DiffM
LLMSV
186
4
0
23 Sep 2025
OmniBridge: Unified Multimodal Understanding, Generation, and Retrieval via Latent Space Alignment
Teng Xiao
Zuchao Li
Lefei Zhang
182
1
0
23 Sep 2025
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation
Yanzuo Lu
Xin Xia
Manlin Zhang
Huafeng Kuang
Jianbin Zheng
Yuxi Ren
Xuefeng Xiao
191
6
0
23 Sep 2025
Text Slider: Efficient and Plug-and-Play Continuous Concept Control for Image/Video Synthesis via LoRA Adapters
Pin-Yen Chiu
I-Sheng Fang
Jun-Cheng Chen
DiffM
131
0
0
23 Sep 2025
Seg4Diff: Unveiling Open-Vocabulary Segmentation in Text-to-Image Diffusion Transformers
Chaehyun Kim
Heeseong Shin
Eunbeen Hong
Heeji Yoon
Anurag Arnab
Paul Hongsuck Seo
Sunghwan Hong
Seungryong Kim
195
6
0
22 Sep 2025
CARINOX: Inference-time Scaling with Category-Aware Reward-based Initial Noise Optimization and Exploration
S. Kasaei
Ali Aghayari
Arash Marioriyad
Niki Sepasian
Shayan Baghayi Nejad
MohammadAmin Fazli
M. Baghshah
M. Rohban
DiffM
EGVM
248
0
0
22 Sep 2025
Degradation-Aware All-in-One Image Restoration via Latent Prior Encoding
S. Sharif
Abdur Rehman
Fayaz Ali Dharejo
Radu Timofte
R. A. Naqvi
DiffM
171
0
0
22 Sep 2025
Follow-Your-Emoji-Faster: Towards Efficient, Fine-Controllable, and Expressive Freestyle Portrait Animation
Yue Ma
Zexuan Yan
Hongyu Liu
H. Wang
Heng Pan
...
H. Shum
Zhifeng Li
Wei Liu
Linfeng Zhang
Qifeng Chen
VGen
267
13
0
20 Sep 2025
A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning
Shaopeng Zhai
Qi Zhang
Tianyi Zhang
Fuxian Huang
Haoran Zhang
Ming Zhou
Shengzhe Zhang
Litao Liu
Sixu Lin
Jiangmiao Pang
OffRL
193
12
0
19 Sep 2025
UnifiedVisual: A Framework for Constructing Unified Vision-Language Datasets
Pengyu Wang
Shaojun Zhou
Chenkun Tan
Xinghao Wang
Wei Huang
Zhen Ye
Zhaowei Li
Botian Jiang
Dong Zhang
Xipeng Qiu
MLLM
141
0
0
18 Sep 2025
AutoEdit: Automatic Hyperparameter Tuning for Image Editing
Chau Pham
Quan Dao
Mahesh Bhosale
Yunjie Tian
Dimitris Metaxas
David Doermann
189
1
0
18 Sep 2025
MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks
Mingsong Li
Lin Liu
Hongjun Wang
Haoxing Chen
Xijun Gu
Shizhan Liu
Dong Gong
Junbo Zhao
Zhenzhong Lan
Jianguo Li
144
0
0
18 Sep 2025
Controllable-Continuous Color Editing in Diffusion Model via Color Mapping
Yuqi Yang
Dongliang Chang
Yuanchen Fang
Yi-Zhe Song
Zhanyu Ma
Jun Guo
DiffM
KELM
148
0
0
17 Sep 2025
EdiVal-Agent: An Object-Centric Framework for Automated, Fine-Grained Evaluation of Multi-Turn Editing
Tianyu Chen
Yasi Zhang
Zhi Zhang
Peiyu Yu
Shu Wang
...
Jianwen Xie
Oscar Leong
L. xilinx Wang
Ying Nian Wu
Mingyuan Zhou
143
0
0
16 Sep 2025
Lego-Edit: A General Image Editing Framework with Model-Level Bricks and MLLM Builder
Qifei Jia
Yu Liu
Yajie Chai
Xintong Yao
Qiming Lu
Y. Zhang
Runyu Shi
Y. Huang
Guoquan Zhang
LM&Ro
126
2
0
16 Sep 2025
HoloGarment: 360° Novel View Synthesis of In-the-Wild Garments
J. Karras
Yingwei Li
Yasamin Jafarian
Ira Kemelmacher-Shlizerman
133
0
0
15 Sep 2025
Robust Concept Erasure in Diffusion Models: A Theoretical Perspective on Security and Robustness
Zixuan Fu
Yan Ren
Finn Carter
Chenyue Wen
Le Ku
Daheng Yu
Emily Davis
Bo Zhang
DiffM
316
0
0
15 Sep 2025
Mask Consistency Regularization in Object Removal
Hua Yuan
Jin Yuan
Yicheng Jiang
Yao Zhang
Xin Geng
Yong Rui
138
0
0
12 Sep 2025
Fine-Grained Customized Fashion Design with Image-into-Prompt benchmark and dataset from LMM
Hui Li
Yi You
Qiqi Chen
Bingfeng Zhang
George Q. Huang
56
0
0
11 Sep 2025
Target-oriented Multimodal Sentiment Classification with Counterfactual-enhanced Debiasing
Zhiyue Liu
Fanrong Ma
Xin Ling
82
0
0
11 Sep 2025
Prompt-Driven Image Analysis with Multimodal Generative AI: Detection, Segmentation, Inpainting, and Interpretation
Kaleem Ahmad
MLLM
97
0
0
10 Sep 2025
Imagining Alternatives: Towards High-Resolution 3D Counterfactual Medical Image Generation via Language Guidance
Mohamed Mohamed
Brennan Nichyporuk
Douglas L. Arnold
Tal Arbel
DiffM
MedIm
153
0
0
07 Sep 2025
OmniStyle2: Scalable and High Quality Artistic Style Transfer Data Generation via Destylization
Ye Wang
Zili Yi
Yibo Zhang
Peng Zheng
Xuping Xie
Jiang Lin
Yilin Wang
Rui Ma
133
0
0
07 Sep 2025
AURAD: Anatomy-Pathology Unified Radiology Synthesis with Progressive Representations
Shuhan Ding
Jingjing Fu
Yu Gu
Naiteek Sangani
Mu-Hsin Wei
Paul Vozila
Nan Liu
Jiang Bian
Hoifung Poon
MedIm
198
0
0
05 Sep 2025
Plotñ Polish: Zero-shot Story Visualization and Disentangled Editing with Text-to-Image Diffusion Models
Kiymet Akdemir
Jing Shi
Kushal Kafle
Brian L. Price
Pinar Yanardag
DiffM
131
0
0
04 Sep 2025
Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping
Jingyi Lu
Kai Han
DiffM
190
3
0
04 Sep 2025
SSGaussian: Semantic-Aware and Structure-Preserving 3D Style Transfer
Jimin Xu
Bosheng Qin
Tao Jin
Zhou Zhao
Zhenhui Ye
Jun-chen Yu
Fei Wu
3DGS
131
0
0
04 Sep 2025
From Editor to Dense Geometry Estimator
Jiyuan Wang
Chunyu Lin
Lei-huan Sun
Rongying Liu
Lang Nie
Mingxing Li
K. Liao
Xiangxiang Chu
Yao-Min Zhao
DiffM
MDE
212
7
0
04 Sep 2025
Skywork UniPic 2.0: Building Kontext Model with Online RL for Unified Multimodal Model
Hongyang Wei
Baixin Xu
Hongbo Liu
Cyrus Wu
J. Liu
...
Ying He
Yang Liu
Xuchen Song
Eric Li
Y. Zhou
182
12
0
04 Sep 2025
Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer
Hyunsoo Cha
Byungjun Kim
Hanbyul Joo
DiffM
138
0
0
04 Sep 2025
OneCAT: Decoder-Only Auto-Regressive Model for Unified Understanding and Generation
Han Li
Xinyu Peng
Y. Wang
Zelin Peng
Xin Chen
Rongxiang Weng
Jingang Wang
Xunliang Cai
Wenrui Dai
Hongkai Xiong
MLLM
OffRL
365
13
0
03 Sep 2025
Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing
Ziyun Zeng
Junhao Zhang
W. Li
Mike Zheng Shou
179
0
0
02 Sep 2025
Exploring Diffusion Models for Generative Forecasting of Financial Charts
Taegyeong Lee
Jiwon Park
Kyunga Bang
Seunghyun Hwang
Ung-Jin Jang
DiffM
64
0
0
02 Sep 2025
Discrete Noise Inversion for Next-scale Autoregressive Text-based Image Editing
Quan Dao
Xiaoxiao He
Ligong Han
Ngan Hoai Nguyen
Amin Heyrani Nobar
Faez Ahmed
Han Zhang
Viet Anh Nguyen
Dimitris N. Metaxas
DiffM
213
0
0
02 Sep 2025
Category-Aware 3D Object Composition with Disentangled Texture and Shape Multi-view Diffusion
Zeren Xiong
Zikun Chen
Zedong Zhang
Xiang Li
Ying Tai
Jian Yang
Jun Yu Li
DiffM
164
2
0
02 Sep 2025
PRINTER:Deformation-Aware Adversarial Learning for Virtual IHC Staining with In Situ Fidelity
Yizhe Yuan
Bingsen Xue
Bangzheng Pu
Chengxiang Wang
Cheng Jin
84
1
0
01 Sep 2025
Previous
1
2
3
4
5
...
33
34
35
Next