ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.03895
  4. Cited By
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks

InstructDiffusion: A Generalist Modeling Interface for Vision Tasks

7 September 2023
Zigang Geng
Binxin Yang
Tiankai Hang
Chen Li
Shuyang Gu
Ting Zhang
Jianmin Bao
Zheng-Wei Zhang
Han Hu
Dongdong Chen
Baining Guo
    DiffM
    VLM
ArXivPDFHTML

Papers citing "InstructDiffusion: A Generalist Modeling Interface for Vision Tasks"

35 / 35 papers shown
Title
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
Ming Li
Xin Gu
Fan Chen
X. Xing
Longyin Wen
C. L. P. Chen
Sijie Zhu
DiffM
71
1
0
05 May 2025
InstructAttribute: Fine-grained Object Attributes editing with Instruction
InstructAttribute: Fine-grained Object Attributes editing with Instruction
Xingxi Yin
Jingfeng Zhang
Zhi Li
Y. Li
Y. Zhang
DiffM
77
0
0
01 May 2025
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Ziqi Pang
Xin Xu
Yu-Xiong Wang
DiffM
60
0
0
15 Apr 2025
Towards Explainable Partial-AIGC Image Quality Assessment
Towards Explainable Partial-AIGC Image Quality Assessment
Jiaying Qian
Ziheng Jia
Zicheng Zhang
Zeyu Zhang
Guangtao Zhai
Xiongkuo Min
35
0
0
12 Apr 2025
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
Jun Zhou
J. Li
Zunnan Xu
Hanhui Li
Yiji Cheng
Fa-Ting Hong
Qin Lin
Qinglin Lu
Xiaodan Liang
DiffM
62
1
0
25 Mar 2025
Human2Robot: Learning Robot Actions from Paired Human-Robot Videos
Human2Robot: Learning Robot Actions from Paired Human-Robot Videos
Sicheng Xie
Haidong Cao
Zejia Weng
Zhen Xing
Shiwei Shen
Jiaqi Leng
Xipeng Qiu
Yanwei Fu
Zuxuan Wu
Yu Jiang
45
0
0
23 Feb 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
91
45
0
03 Jan 2025
RORem: Training a Robust Object Remover with Human-in-the-Loop
RORem: Training a Robust Object Remover with Human-in-the-Loop
Ruibin Li
Tao Yang
Song Guo
L. Zhang
40
3
0
01 Jan 2025
Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
P. Xu
Boyuan Jiang
Xiaobin Hu
Donghao Luo
Q. He
J. Zhang
Chengjie Wang
Yunsheng Wu
Charles X. Ling
Boyu Wang
87
2
0
24 Nov 2024
LaVin-DiT: Large Vision Diffusion Transformer
Zhaoqing Wang
Xiaobo Xia
Runnan Chen
Dongdong Yu
Changhu Wang
M. Gong
Tongliang Liu
92
6
0
18 Nov 2024
ColorEdit: Training-free Image-Guided Color editing with diffusion model
ColorEdit: Training-free Image-Guided Color editing with diffusion model
Xingxi Yin
Zhi Li
Jingfeng Zhang
Chenglin Li
Yin Zhang
DiffM
47
0
0
15 Nov 2024
Visual-O1: Understanding Ambiguous Instructions via Multi-modal
  Multi-turn Chain-of-thoughts Reasoning
Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Minheng Ni
Yutao Fan
Lei Zhang
Wangmeng Zuo
LRM
AI4CE
24
6
0
04 Oct 2024
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Weifeng Lin
Xinyu Wei
Renrui Zhang
Le Zhuo
Shitian Zhao
...
Junlin Xie
Junlin Xie
Yu Qiao
Peng Gao
Hongsheng Li
MLLM
DiffM
50
10
0
23 Sep 2024
NeIn: Telling What You Don't Want
NeIn: Telling What You Don't Want
Nhat-Tan Bui
Dinh-Hieu Hoang
Quoc-Huy Trinh
Minh-Triet Tran
Truong Nguyen
Susan Gauch
29
2
0
09 Sep 2024
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Fushuo Huo
Wenchao Xu
Zhong Zhang
Haozhao Wang
Zhicheng Chen
Peilin Zhao
VLM
MLLM
61
18
0
04 Aug 2024
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and
  Editing
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
Zhenyu Wang
Aoxue Li
Zhenguo Li
Xihui Liu
MLLM
DiffM
36
25
0
08 Jul 2024
ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing
ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing
Ying Jin
Pengyang Ling
Xiao-wen Dong
Pan Zhang
Jiaqi Wang
Dahua Lin
24
2
0
18 May 2024
Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in
  the Wild
Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild
Donggyun Kim
Seongwoong Cho
Semin Kim
Chong Luo
Seunghoon Hong
VLM
21
2
0
29 Apr 2024
Explore In-Context Segmentation via Latent Diffusion Models
Explore In-Context Segmentation via Latent Diffusion Models
Chaoyang Wang
Xiangtai Li
Henghui Ding
Lu Qi
Jiangning Zhang
Yunhai Tong
Chen Change Loy
Shuicheng Yan
DiffM
60
6
0
14 Mar 2024
Diffusion Model-Based Image Editing: A Survey
Diffusion Model-Based Image Editing: A Survey
Yi Huang
Jiancheng Huang
Yifan Liu
Mingfu Yan
Jiaxi Lv
Jianzhuang Liu
Wei Xiong
He Zhang
Liangliang Cao
Liangliang Cao
EGVM
66
82
0
27 Feb 2024
CCA: Collaborative Competitive Agents for Image Editing
CCA: Collaborative Competitive Agents for Image Editing
Tiankai Hang
Shuyang Gu
Dong Chen
Xin Geng
Baining Guo
20
5
0
23 Jan 2024
SPIRE: Semantic Prompt-Driven Image Restoration
SPIRE: Semantic Prompt-Driven Image Restoration
Chenyang Qi
Zhengzhong Tu
Keren Ye
M. Delbracio
P. Milanfar
Qifeng Chen
Hossein Talebi
DiffM
19
11
0
18 Dec 2023
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion
  Models
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models
Zhen Xing
Qi Dai
Zihao Zhang
Hui Zhang
Hang-Rui Hu
Zuxuan Wu
Yu-Gang Jiang
VGen
30
17
0
30 Nov 2023
InstructSeq: Unifying Vision Tasks with Instruction-conditioned
  Multi-modal Sequence Generation
InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation
Rongyao Fang
Shilin Yan
Zhaoyang Huang
Jingqiu Zhou
Hao Tian
Jifeng Dai
Hongsheng Li
MLLM
22
8
0
30 Nov 2023
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image
  Editing
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
Kai Zhang
Lingbo Mo
Wenhu Chen
Huan Sun
Yu-Chuan Su
EGVM
105
235
0
16 Jun 2023
In-Context Learning Unlocked for Diffusion Models
In-Context Learning Unlocked for Diffusion Models
Zhendong Wang
Yifan Jiang
Yadong Lu
Yelong Shen
Pengcheng He
Weizhu Chen
Zhangyang Wang
Mingyuan Zhou
VLM
DiffM
86
68
0
01 May 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
385
4,010
0
28 Jan 2022
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
Zhao Yang
Jiaqi Wang
Yansong Tang
Kai-xiang Chen
Hengshuang Zhao
Philip H. S. Torr
133
308
0
04 Dec 2021
Pix2seq: A Language Modeling Framework for Object Detection
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
233
341
0
22 Sep 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
A Benchmark and Baseline for Language-Driven Image Editing
A Benchmark and Baseline for Language-Driven Image Editing
Jing Shi
Ning Xu
Trung Bui
Franck Dernoncourt
Zheng Wen
Chenliang Xu
DiffM
110
30
0
05 Oct 2020
GIQA: Generated Image Quality Assessment
GIQA: Generated Image Quality Assessment
Shuyang Gu
Jianmin Bao
Dong Chen
Fang Wen
EGVM
141
81
0
19 Mar 2020
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networks
Tero Karras
S. Laine
Timo Aila
262
10,183
0
12 Dec 2018
1