ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.09800
  4. Cited By
InstructPix2Pix: Learning to Follow Image Editing Instructions

InstructPix2Pix: Learning to Follow Image Editing Instructions

17 November 2022
Tim Brooks
Aleksander Holynski
Alexei A. Efros
    DiffM
ArXivPDFHTML

Papers citing "InstructPix2Pix: Learning to Follow Image Editing Instructions"

50 / 281 papers shown
Title
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Donghoon Kim
Minji Bae
Kyuhong Shim
B. Shim
21
0
0
13 May 2025
Object detection in adverse weather conditions for autonomous vehicles using Instruct Pix2Pix
Object detection in adverse weather conditions for autonomous vehicles using Instruct Pix2Pix
Unai Gurbindo
Axel Brando
Jaume Abella
Caroline König
16
0
0
13 May 2025
MDE-Edit: Masked Dual-Editing for Multi-Object Image Editing via Diffusion Models
MDE-Edit: Masked Dual-Editing for Multi-Object Image Editing via Diffusion Models
Hongyang Zhu
Haipeng Liu
Bo Fu
Yang Wang
DiffM
28
0
0
08 May 2025
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
Ming Li
Xin Gu
Fan Chen
X. Xing
Longyin Wen
C. L. P. Chen
Sijie Zhu
DiffM
68
1
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
57
0
0
05 May 2025
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Biao Gong
Cheng Zou
Dandan Zheng
Hu Yu
Jingdong Chen
...
Qingpei Guo
Rui Liu
Weilong Chai
Xinyu Xiao
Ziyuan Huang
MLLM
74
1
0
05 May 2025
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
Volodymyr Havrylov
Haiwen Huang
Dan Zhang
Andreas Geiger
34
0
0
04 May 2025
Segment Any RGB-Thermal Model with Language-aided Distillation
Segment Any RGB-Thermal Model with Language-aided Distillation
Dong Xing
Xianxun Zhu
Wei Zhou
Qika Lin
Hang Yang
Yuqing Wang
VLM
49
0
0
04 May 2025
Rethinking Score Distilling Sampling for 3D Editing and Generation
Rethinking Score Distilling Sampling for 3D Editing and Generation
Xingyu Miao
Haoran Duan
Yang Long
J. Han
39
0
0
03 May 2025
InstructAttribute: Fine-grained Object Attributes editing with Instruction
InstructAttribute: Fine-grained Object Attributes editing with Instruction
Xingxi Yin
Jingfeng Zhang
Zhi Li
Y. Li
Y. Zhang
DiffM
73
0
0
01 May 2025
Multi-Modal Language Models as Text-to-Image Model Evaluators
Multi-Modal Language Models as Text-to-Image Model Evaluators
Jiahui Chen
Candace Ross
Reyhane Askari Hemmat
Koustuv Sinha
Melissa Hall
M. Drozdzal
Adriana Romero-Soriano
EGVM
60
0
0
01 May 2025
Advance Fake Video Detection via Vision Transformers
Advance Fake Video Detection via Vision Transformers
Joy Battocchio
S. Dell’Anna
Andrea Montibeller
Giulia Boato
ViT
VGen
34
0
0
29 Apr 2025
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer
Zechuan Zhang
Ji Xie
Yu Lu
Zongxin Yang
Y. Yang
DiffM
89
1
0
29 Apr 2025
DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer
DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer
Junpeng Jiang
Gangyi Hong
Miao Zhang
Hengtong Hu
Kun Zhan
Rui Shao
Liqiang Nie
VGen
49
0
0
28 Apr 2025
SynergyAmodal: Deocclude Anything with Text Control
SynergyAmodal: Deocclude Anything with Text Control
Xinyang Li
Chengjie Yi
Jiawei Lai
Mingbao Lin
Yansong Qu
Shengchuan Zhang
Liujuan Cao
DiffM
73
0
0
28 Apr 2025
CapsFake: A Multimodal Capsule Network for Detecting Instruction-Guided Deepfakes
CapsFake: A Multimodal Capsule Network for Detecting Instruction-Guided Deepfakes
Tuan Nguyen
Naseem Khan
Issa Khalil
AAML
52
0
0
27 Apr 2025
IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos
IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos
Yuan Li
Ziqian Bai
Feitong Tan
Zhaopeng Cui
S. Fanello
Yinda Zhang
DiffM
VGen
49
0
0
27 Apr 2025
REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models
REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models
Gal Almog
Ariel Shamir
Ohad Fried
DiffM
50
0
0
26 Apr 2025
Step1X-Edit: A Practical Framework for General Image Editing
Step1X-Edit: A Practical Framework for General Image Editing
S. Liu
Yucheng Han
Peng Xing
Fukun Yin
Rui Wang
...
Yibo Zhu
Binxing Jiao
X. Zhang
Gang Yu
Daxin Jiang
DiffM
93
2
0
24 Apr 2025
We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback
We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback
Minkyu Choi
Sundar Sripada V. S.
Harsh Goel
Sahil Shah
Sandeep P. Chinchali
DiffM
VGen
79
0
0
24 Apr 2025
DCT-Shield: A Robust Frequency Domain Defense against Malicious Image Editing
DCT-Shield: A Robust Frequency Domain Defense against Malicious Image Editing
Aniruddha Bala
Rohit Chowdhury
Rohan Jaiswal
Siddharth Roheda
DiffM
AAML
62
0
0
24 Apr 2025
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
Le Zhuo
Liangbing Zhao
Sayak Paul
Yue Liao
Renrui Zhang
Yi Xin
Peng Gao
Mohamed Elhoseiny
H. Li
VLM
63
0
0
22 Apr 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
49
0
0
22 Apr 2025
Cobra: Efficient Line Art COlorization with BRoAder References
Cobra: Efficient Line Art COlorization with BRoAder References
Junhao Zhuang
Lingen Li
Xuan Ju
Zhaoyang Zhang
C. Yuan
Ying Shan
DiffM
62
0
0
16 Apr 2025
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Ziqi Pang
Xin Xu
Yu-Xiong Wang
DiffM
57
0
0
15 Apr 2025
Omni-Dish: Photorealistic and Faithful Image Generation and Editing for Arbitrary Chinese Dishes
Omni-Dish: Photorealistic and Faithful Image Generation and Editing for Arbitrary Chinese Dishes
Huijie Liu
Bingcan Wang
Jie Hu
Xiaoming Wei
Guoliang Kang
61
0
0
14 Apr 2025
SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow
SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow
Kenan Tang
Yanhong Li
Yao Qin
DiffM
33
0
0
13 Apr 2025
Towards Explainable Partial-AIGC Image Quality Assessment
Towards Explainable Partial-AIGC Image Quality Assessment
Jiaying Qian
Ziheng Jia
Zicheng Zhang
Zeyu Zhang
Guangtao Zhai
Xiongkuo Min
31
0
0
12 Apr 2025
Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment
Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment
Jiayang Sun
H. Wang
Jie Cao
Huaibo Huang
R. He
DiffM
68
0
0
10 Apr 2025
Probability Density Geodesics in Image Diffusion Latent Space
Probability Density Geodesics in Image Diffusion Latent Space
Qingtao Yu
Jaskirat Singh
Zhaoyuan Yang
Peter Tu
Jing Zhang
Hongdong Li
Richard Hartley
Dylan Campbell
DiffM
57
0
0
09 Apr 2025
D-Feat Occlusions: Diffusion Features for Robustness to Partial Visual Occlusions in Object Recognition
D-Feat Occlusions: Diffusion Features for Robustness to Partial Visual Occlusions in Object Recognition
Rupayan Mallick
Sibo Dong
Nataniel Ruiz
Sarah Adel Bargal
DiffM
39
0
0
08 Apr 2025
Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision
Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision
Yuandong Pu
Le Zhuo
Kaiwen Zhu
Liangbin Xie
Wenlong Zhang
Xiangyu Chen
Peng Gao
Yu Qiao
Chao Dong
Yihao Liu
MLLM
55
1
0
07 Apr 2025
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
Xiangyu Zhao
Peiyuan Zhang
Kexian Tang
Hao Li
Zicheng Zhang
Guangtao Zhai
Junchi Yan
Hua Yang
Xue Yang
Haodong Duan
VLM
LRM
41
0
0
03 Apr 2025
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation
Zhiyuan Yan
Junyan Ye
Weijia Li
Zilong Huang
Shenghai Yuan
Xiangyang He
Kaiqing Lin
Jun-Jian He
Conghui He
Li Yuan
MLLM
EGVM
88
8
0
03 Apr 2025
Towards Understanding How Knowledge Evolves in Large Vision-Language Models
Towards Understanding How Knowledge Evolves in Large Vision-Language Models
Sudong Wang
Y. Zhang
Yao Zhu
Jianing Li
Zizhe Wang
Y. Liu
Xiangyang Ji
38
0
0
31 Mar 2025
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
Shijie Zhou
Hui Ren
Yijia Weng
Shuwang Zhang
Zhen Wang
...
Zhiwen Fan
Suya You
Z. Wang
Leonidas J. Guibas
A. Kadambi
VGen
3DGS
81
0
0
26 Mar 2025
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
Prin Phunyaphibarn
Phillip Y. Lee
Jaihoon Kim
Minhyuk Sung
DiffM
78
0
0
26 Mar 2025
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
Jun Zhou
J. Li
Zunnan Xu
Hanhui Li
Yiji Cheng
Fa-Ting Hong
Qin Lin
Qinglin Lu
Xiaodan Liang
DiffM
52
1
0
25 Mar 2025
Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning
Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning
Sherry X Chen
Misha Sra
Pradeep Sen
50
0
0
24 Mar 2025
SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction
SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction
Zhengyuan Li
Kai Cheng
Anindita Ghosh
Uttaran Bhattacharya
Liangyan Gui
Aniket Bera
DiffM
VGen
37
0
0
23 Mar 2025
MotionDiff: Training-free Zero-shot Interactive Motion Editing via Flow-assisted Multi-view Diffusion
MotionDiff: Training-free Zero-shot Interactive Motion Editing via Flow-assisted Multi-view Diffusion
Yikun Ma
Yiqing Li
Jiawei Wu
Xing Luo
Zhi Jin
DiffM
VGen
51
0
0
22 Mar 2025
Enhancing Product Search Interfaces with Sketch-Guided Diffusion and Language Agents
Enhancing Product Search Interfaces with Sketch-Guided Diffusion and Language Agents
Edward Sun
DiffM
27
0
0
21 Mar 2025
TULIP: Towards Unified Language-Image Pretraining
TULIP: Towards Unified Language-Image Pretraining
Zineng Tang
Long Lian
Seun Eisape
Xudong Wang
Roei Herzig
Adam Yala
Alane Suhr
Trevor Darrell
David M. Chan
VLM
CLIP
MLLM
86
3
0
19 Mar 2025
GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback
GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback
Sungjae Lee
Yeonjoo Hong
Kwang In KIm
44
0
0
19 Mar 2025
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
Tsu-jui Fu
Yusu Qian
Chen Chen
Wenze Hu
Zhe Gan
Y. Yang
85
1
0
16 Mar 2025
VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction
VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction
Zijian He
Yuwei Ning
Yipeng Qin
Wangrun Wang
Sibei Yang
Liang Lin
G. Li
53
1
0
15 Mar 2025
PSF-4D: A Progressive Sampling Framework for View Consistent 4D Editing
PSF-4D: A Progressive Sampling Framework for View Consistent 4D Editing
H. Iqbal
Nazmul Karim
Umar Khalid
Azib Farooq
Z. Zhong
Jing Hua
Chen Chen
DiffM
3DGS
VGen
45
0
0
14 Mar 2025
Fine-Tuning Diffusion Generative Models via Rich Preference Optimization
Fine-Tuning Diffusion Generative Models via Rich Preference Optimization
Hanyang Zhao
Haoxian Chen
Yucheng Guo
Genta Indra Winata
Tingting Ou
Ziyu Huang
D. Yao
Wenpin Tang
54
0
0
13 Mar 2025
PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models
PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models
Runze He
Bo Cheng
Yuhang Ma
Qingxiang Jia
Shanyuan Liu
Ao Ma
Xiaoyu Wu
Liebucha Wu
Dawei Leng
Yuhui Yin
DiffM
VLM
43
0
0
13 Mar 2025
AudioX: Diffusion Transformer for Anything-to-Audio Generation
AudioX: Diffusion Transformer for Anything-to-Audio Generation
Zeyue Tian
Yizhu Jin
Zhaoyang Liu
Ruibin Yuan
Xu Tan
Qifeng Chen
Wei Xue
Y. Guo
65
3
0
13 Mar 2025
123456
Next