ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.20405
  4. Cited By
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models

What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models

26 May 2025
Lorenzo Baraldi
Davide Bucciarelli
Federico Betti
Marcella Cornia
Lorenzo Baraldi
N. Sebe
Rita Cucchiara
ArXivPDFHTML

Papers citing "What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models"

44 / 44 papers shown
Title
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
157
1,368
0
22 Jan 2025
I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing
I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing
Yiwei Ma
Jiayi Ji
Ke Ye
Weihuang Lin
Zhibin Wang
Yonghan Zheng
Qiang-feng Zhou
Xiaoshuai Sun
Rongrong Ji
54
8
0
26 Aug 2024
Building and better understanding vision-language models: insights and
  future directions
Building and better understanding vision-language models: insights and future directions
Hugo Laurençon
Andrés Marafioti
Victor Sanh
Léo Tronchon
VLM
81
69
0
22 Aug 2024
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal
  Large Language Models
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Jiabo Ye
Haiyang Xu
Haowei Liu
Anwen Hu
Ming Yan
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
61
115
0
09 Aug 2024
Groma: Localized Visual Tokenization for Grounding Multimodal Large
  Language Models
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Chuofan Ma
Yi Jiang
Jiannan Wu
Zehuan Yuan
Xiaojuan Qi
VLM
ObjD
47
56
0
19 Apr 2024
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
Mude Hui
Siwei Yang
Bingchen Zhao
Yichun Shi
Heng Wang
Peng Wang
Yuyin Zhou
Cihang Xie
67
61
0
15 Apr 2024
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
...
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
203
1,187
0
05 Mar 2024
Diffusion Model-Based Image Editing: A Survey
Diffusion Model-Based Image Editing: A Survey
Yi Huang
Jiancheng Huang
Yifan Liu
Mingfu Yan
Jiaxi Lv
Jianzhuang Liu
Wei Xiong
He Zhang
Liangliang Cao
Liangliang Cao
EGVM
96
90
0
27 Feb 2024
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
Yichi Zhang
Ziqiao Ma
Xiaofeng Gao
Suhaila Shakiah
Qiaozi Gao
Joyce Chai
MLLM
VLM
83
42
0
26 Feb 2024
The Revolution of Multimodal Large Language Models: A Survey
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni
Federico Cocchi
Luca Barsellotti
Nicholas Moratelli
Sara Sarto
Lorenzo Baraldi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
LRM
VLM
72
47
0
19 Feb 2024
SmartEdit: Exploring Complex Instruction-based Image Editing with
  Multimodal Large Language Models
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
Yuzhou Huang
Liangbin Xie
Xintao Wang
Ziyang Yuan
Xiaodong Cun
...
Jiantao Zhou
Chao Dong
Rui Huang
Ruimao Zhang
Ying Shan
DiffM
51
68
0
11 Dec 2023
LION : Empowering Multimodal Large Language Model with Dual-Level Visual
  Knowledge
LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Gongwei Chen
Leyang Shen
Rui Shao
Xiang Deng
Liqiang Nie
VLM
MLLM
78
46
0
20 Nov 2023
Emu Edit: Precise Image Editing via Recognition and Generation Tasks
Emu Edit: Precise Image Editing via Recognition and Generation Tasks
Shelly Sheynin
Adam Polyak
Uriel Singer
Yuval Kirstain
Amit Zohar
Oron Ashual
Devi Parikh
Yaniv Taigman
35
139
0
16 Nov 2023
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
  Latent Diffusion
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Anton Razzhigaev
Arseniy Shakhmatov
Anastasia Maltseva
V.Ya. Arkhipkin
Igor Pavlov
Ilya Ryabov
Angelina Kuts
Alexander Panchenko
Andrey Kuznetsov
Denis Dimitrov
85
81
0
05 Oct 2023
Guiding Instruction-based Image Editing via Multimodal Large Language
  Models
Guiding Instruction-based Image Editing via Multimodal Large Language Models
Johannes Frey
Wenze Hu
Xianzhi Du
William Yang Wang
Yinfei Yang
Zhe Gan
49
93
0
29 Sep 2023
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Zigang Geng
Binxin Yang
Tiankai Hang
Chen Li
Shuyang Gu
...
Jianmin Bao
Zheng Zhang
Han Hu
DongDong Chen
Baining Guo
DiffM
VLM
64
96
0
07 Sep 2023
LISA: Reasoning Segmentation via Large Language Model
LISA: Reasoning Segmentation via Large Language Model
Xin Lai
Zhuotao Tian
Yukang Chen
Yanwei Li
Yuhui Yuan
Shu Liu
Jiaya Jia
LM&Ro
VLM
MLLM
LRM
80
424
0
01 Aug 2023
Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation
  Evaluation
Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation
Federico Betti
Jacopo Staiano
Lorenzo Baraldi
Lorenzo Baraldi
Rita Cucchiara
N. Sebe
EGVM
37
7
0
18 Jul 2023
SDXL: Improving Latent Diffusion Models for High-Resolution Image
  Synthesis
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Dustin Podell
Zion English
Kyle Lacey
A. Blattmann
Tim Dockhorn
Jonas Muller
Joe Penna
Robin Rombach
162
2,242
0
04 Jul 2023
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Ke Chen
Zhao Zhang
Weili Zeng
Richong Zhang
Feng Zhu
Rui Zhao
ObjD
58
622
0
27 Jun 2023
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image
  Editing
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
Kai Zhang
Lingbo Mo
Wenhu Chen
Huan Sun
Yu-Chuan Su
EGVM
143
254
0
16 Jun 2023
QLoRA: Efficient Finetuning of Quantized LLMs
QLoRA: Efficient Finetuning of Quantized LLMs
Tim Dettmers
Artidoro Pagnoni
Ari Holtzman
Luke Zettlemoyer
ALM
106
2,454
0
23 May 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
332
4,506
0
17 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
242
3,205
0
14 Apr 2023
Sigmoid Loss for Language Image Pre-Training
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
70
1,028
0
27 Mar 2023
HIVE: Harnessing Human Feedback for Instructional Visual Editing
HIVE: Harnessing Human Feedback for Instructional Visual Editing
Shu Zhen Zhang
Xinyi Yang
Yihao Feng
Can Qin
Chia-Chih Chen
...
Haiquan Wang
Silvio Savarese
Stefano Ermon
Caiming Xiong
Ran Xu
47
109
0
16 Mar 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
422
13,788
0
15 Mar 2023
InstructPix2Pix: Learning to Follow Image Editing Instructions
InstructPix2Pix: Learning to Follow Image Editing Instructions
Tim Brooks
Aleksander Holynski
Alexei A. Efros
DiffM
142
1,745
0
17 Nov 2022
Simple Open-Vocabulary Object Detection with Vision Transformers
Simple Open-Vocabulary Object Detection with Vision Transformers
Matthias Minderer
A. Gritsenko
Austin Stone
Maxim Neumann
Dirk Weissenborn
...
Zhuoran Shen
Tianlin Li
Xiaohua Zhai
Thomas Kipf
N. Houlsby
ObjD
CLIP
VLM
ViT
OCL
76
310
0
12 May 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
570
9,009
0
28 Jan 2022
High-Resolution Image Synthesis with Latent Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
248
15,081
0
20 Dec 2021
Resolution-robust Large Mask Inpainting with Fourier Convolutions
Resolution-robust Large Mask Inpainting with Fourier Convolutions
Roman Suvorov
Elizaveta Logacheva
Anton Mashikhin
Anastasia Remizova
Arsenii Ashukha
Aleksei Silvestrov
Naejin Kong
Harshith Goka
Kiwoong Park
Victor Lempitsky
74
837
0
15 Sep 2021
Learning by Planning: Language-Guided Global Image Editing
Learning by Planning: Language-Guided Global Image Editing
Jing Shi
Ning Xu
Yihang Xu
Trung Bui
Franck Dernoncourt
Chenliang Xu
DiffM
LM&Ro
43
32
0
24 Jun 2021
Diffusion Models Beat GANs on Image Synthesis
Diffusion Models Beat GANs on Image Synthesis
Prafulla Dhariwal
Alex Nichol
130
7,639
0
11 May 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
550
5,920
0
29 Apr 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
80
1,512
0
18 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
666
28,659
0
26 Feb 2021
A Benchmark and Baseline for Language-Driven Image Editing
A Benchmark and Baseline for Language-Driven Image Editing
Jing Shi
Ning Xu
Trung Bui
Franck Dernoncourt
Zheng Wen
Chenliang Xu
DiffM
147
31
0
05 Oct 2020
End-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
275
12,847
0
26 May 2020
CNN-generated images are surprisingly easy to spot... for now
CNN-generated images are surprisingly easy to spot... for now
Sheng-Yu Wang
Oliver Wang
Richard Y. Zhang
Andrew Owens
Alexei A. Efros
OOD
114
965
0
23 Dec 2019
LVIS: A Dataset for Large Vocabulary Instance Segmentation
LVIS: A Dataset for Large Vocabulary Instance Segmentation
Agrim Gupta
Piotr Dollár
Ross B. Girshick
ISeg
VLM
88
1,352
0
08 Aug 2019
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networks
Tero Karras
S. Laine
Timo Aila
502
10,466
0
12 Dec 2018
Fast R-CNN
Fast R-CNN
Ross B. Girshick
ObjD
275
24,933
0
30 Apr 2015
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
234
43,290
0
01 May 2014
1