ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.03601
  4. Cited By
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

7 July 2023
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
    VLM
    MLLM
ArXivPDFHTML

Papers citing "GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest"

49 / 199 papers shown
Title
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral
  Planning States for Autonomous Driving
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Wenhai Wang
Jiangwei Xie
ChuanYang Hu
Haoming Zou
Jianan Fan
...
Lewei Lu
Xizhou Zhu
Xiaogang Wang
Yu Qiao
Jifeng Dai
13
122
0
14 Dec 2023
Pixel Aligned Language Models
Pixel Aligned Language Models
Jiarui Xu
Xingyi Zhou
Shen Yan
Xiuye Gu
Anurag Arnab
Chen Sun
Xiaolong Wang
Cordelia Schmid
MLLM
VLM
43
14
0
14 Dec 2023
Tokenize Anything via Prompting
Tokenize Anything via Prompting
Ting Pan
Lulu Tang
Xinlong Wang
Shiguang Shan
VLM
16
20
0
14 Dec 2023
See, Say, and Segment: Teaching LMMs to Overcome False Premises
See, Say, and Segment: Teaching LMMs to Overcome False Premises
Tsung-Han Wu
Giscard Biamby
David M. Chan
Lisa Dunlap
Ritwik Gupta
Xudong Wang
Joseph E. Gonzalez
Trevor Darrell
VLM
MLLM
17
18
0
13 Dec 2023
AVA: Towards Autonomous Visualization Agents through Visual
  Perception-Driven Decision-Making
AVA: Towards Autonomous Visualization Agents through Visual Perception-Driven Decision-Making
Shusen Liu
Haichao Miao
Zhimin Li
M. Olson
Valerio Pascucci
P. Bremer
9
8
0
07 Dec 2023
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLM
CLIP
14
80
0
06 Dec 2023
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang
Hongyang Li
Feng Li
Tianhe Ren
Xueyan Zou
...
Shijia Huang
Jianfeng Gao
Lei Zhang
Chun-yue Li
Jianwei Yang
85
30
0
05 Dec 2023
PixelLM: Pixel Reasoning with Large Multimodal Model
PixelLM: Pixel Reasoning with Large Multimodal Model
Zhongwei Ren
Zhicheng Huang
Yunchao Wei
Yao-Min Zhao
Dongmei Fu
Jiashi Feng
Xiaojie Jin
VLM
MLLM
LRM
17
62
0
04 Dec 2023
Segment and Caption Anything
Segment and Caption Anything
Xiaoke Huang
Jianfeng Wang
Yansong Tang
Zheng Zhang
Han Hu
Jiwen Lu
Lijuan Wang
Zicheng Liu
MLLM
VLM
21
13
0
01 Dec 2023
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual
  Prompts
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Mu Cai
Haotian Liu
Dennis Park
Siva Karthik Mustikovela
Gregory P. Meyer
Yuning Chai
Yong Jae Lee
VLM
LRM
MLLM
15
75
0
01 Dec 2023
Dolphins: Multimodal Language Model for Driving
Dolphins: Multimodal Language Model for Driving
Yingzi Ma
Yulong Cao
Jiachen Sun
Marco Pavone
Chaowei Xiao
MLLM
6
49
0
01 Dec 2023
Merlin:Empowering Multimodal LLMs with Foresight Minds
Merlin:Empowering Multimodal LLMs with Foresight Minds
En Yu
Liang Zhao
Yana Wei
Jinrong Yang
Dongming Wu
...
Haoran Wei
Tiancai Wang
Zheng Ge
Xiangyu Zhang
Wenbing Tao
LRM
8
21
0
30 Nov 2023
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding,
  Reasoning, and Planning
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Sijin Chen
Xin Chen
C. Zhang
Mingsheng Li
Gang Yu
Hao Fei
Hongyuan Zhu
Jiayuan Fan
Tao Chen
MLLM
8
76
0
30 Nov 2023
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models
  via Over-Trust Penalty and Retrospection-Allocation
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Qidong Huang
Xiao-wen Dong
Pan Zhang
Bin Wang
Conghui He
Jiaqi Wang
Dahua Lin
Weiming Zhang
Neng H. Yu
MLLM
15
146
0
29 Nov 2023
TextDiffuser-2: Unleashing the Power of Language Models for Text
  Rendering
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
Jingye Chen
Yupan Huang
Tengchao Lv
Lei Cui
Qifeng Chen
Furu Wei
DiffM
6
60
0
28 Nov 2023
Griffon: Spelling out All Object Locations at Any Granularity with Large
  Language Models
Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
Yufei Zhan
Yousong Zhu
Zhiyang Chen
Fan Yang
E. Goles
Jinqiao Wang
ObjD
35
13
0
24 Nov 2023
Enhancing Visual Grounding and Generalization: A Multi-Task Cycle
  Training Approach for Vision-Language Models
Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language Models
Xiaoyu Yang
Lijian Xu
Hao Sun
Hongsheng Li
Shaoting Zhang
ObjD
14
4
0
21 Nov 2023
Towards Open-Ended Visual Recognition with Large Language Model
Towards Open-Ended Visual Recognition with Large Language Model
Qihang Yu
Xiaohui Shen
Liang-Chieh Chen
VLM
12
8
0
14 Nov 2023
Vision-Language Instruction Tuning: A Review and Analysis
Vision-Language Instruction Tuning: A Review and Analysis
Chen Li
Yixiao Ge
Dian Li
Ying Shan
VLM
20
11
0
14 Nov 2023
PerceptionGPT: Effectively Fusing Visual Perception into LLM
PerceptionGPT: Effectively Fusing Visual Perception into LLM
Renjie Pi
Lewei Yao
Jiahui Gao
Jipeng Zhang
Tong Zhang
MLLM
10
26
0
11 Nov 2023
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
Shezheng Song
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
58
3
0
10 Nov 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
...
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
30
102
0
09 Nov 2023
NExT-Chat: An LMM for Chat, Detection and Segmentation
NExT-Chat: An LMM for Chat, Detection and Segmentation
Ao Zhang
Yuan Yao
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
MLLM
VLM
35
45
0
08 Nov 2023
GLaMM: Pixel Grounding Large Multimodal Model
GLaMM: Pixel Grounding Large Multimodal Model
H. Rasheed
Muhammad Maaz
Sahal Shaji Mullappilly
Abdelrahman M. Shaker
Salman Khan
Hisham Cholakkal
Rao M. Anwer
Erix Xing
Ming-Hsuan Yang
Fahad S. Khan
MLLM
VLM
16
194
0
06 Nov 2023
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
Yifan Du
Hangyu Guo
Kun Zhou
Wayne Xin Zhao
Jinpeng Wang
Chuyuan Wang
Mingchen Cai
Ruihua Song
Ji-Rong Wen
VLM
MLLM
LRM
35
22
0
02 Nov 2023
CapsFusion: Rethinking Image-Text Data at Scale
CapsFusion: Rethinking Image-Text Data at Scale
Qiying Yu
Quan-Sen Sun
Xiaosong Zhang
Yufeng Cui
Fan Zhang
Yue Cao
Xinlong Wang
Jingjing Liu
VLM
13
53
0
31 Oct 2023
ControlLLM: Augment Language Models with Tools by Searching on Graphs
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Zhaoyang Liu
Zeqiang Lai
Zhangwei Gao
Erfei Cui
Ziheng Li
...
Lewei Lu
Qifeng Chen
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
123
20
0
26 Oct 2023
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language
  Models
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
Dongsheng Jiang
Yuchen Liu
Songlin Liu
Jiné Zhao
Hao Zhang
Zhen Gao
Xiaopeng Zhang
Jin Li
Hongkai Xiong
MLLM
VLM
23
28
0
13 Oct 2023
Ferret: Refer and Ground Anything Anywhere at Any Granularity
Ferret: Refer and Ground Anything Anywhere at Any Granularity
Haoxuan You
Haotian Zhang
Zhe Gan
Xianzhi Du
Bowen Zhang
Zirui Wang
Liangliang Cao
Shih-Fu Chang
Yinfei Yang
ObjD
MLLM
VLM
9
275
0
11 Oct 2023
Improved Baselines with Visual Instruction Tuning
Improved Baselines with Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLM
MLLM
20
2,395
0
05 Oct 2023
HallE-Control: Controlling Object Hallucination in Large Multimodal
  Models
HallE-Control: Controlling Object Hallucination in Large Multimodal Models
Bohan Zhai
Shijia Yang
Chenfeng Xu
Sheng Shen
Kurt Keutzer
Chunyuan Li
Manling Li
MLLM
13
11
0
03 Oct 2023
Pink: Unveiling the Power of Referential Comprehension for Multi-modal
  LLMs
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
Shiyu Xuan
Qingpei Guo
Ming Yang
Shiliang Zhang
MLLM
ObjD
13
32
0
01 Oct 2023
DreamLLM: Synergistic Multimodal Comprehension and Creation
DreamLLM: Synergistic Multimodal Comprehension and Creation
Runpei Dong
Chunrui Han
Yuang Peng
Zekun Qi
Zheng Ge
...
Hao-Ran Wei
Xiangwen Kong
Xiangyu Zhang
Kaisheng Ma
Li Yi
MLLM
18
152
0
20 Sep 2023
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Zhening Huang
Xiaoyang Wu
Xi Chen
Hengshuang Zhao
Lei Zhu
Joan Lasenby
ISeg
3DPC
VLM
26
46
0
01 Sep 2023
PointLLM: Empowering Large Language Models to Understand Point Clouds
PointLLM: Empowering Large Language Models to Understand Point Clouds
Runsen Xu
Xiaolong Wang
Tai Wang
Yilun Chen
Jiangmiao Pang
Dahua Lin
MLLM
23
126
0
31 Aug 2023
Position-Enhanced Visual Instruction Tuning for Multimodal Large
  Language Models
Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
Chi Chen
Ruoyu Qin
Fuwen Luo
Xiaoyue Mi
Peng Li
Maosong Sun
Yang Liu
MLLM
VLM
6
42
0
25 Aug 2023
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
Wenqi Shao
Yutao Hu
Peng Gao
Meng Lei
Kaipeng Zhang
...
Peng-Tao Xu
Siyuan Huang
Hongsheng Li
Yuning Qiao
Ping Luo
VLM
MLLM
14
2
0
07 Aug 2023
The All-Seeing Project: Towards Panoptic Visual Recognition and
  Understanding of the Open World
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Weiyun Wang
Min Shi
Qingyun Li
Wen Wang
Zhenhang Huang
...
Zhiguo Cao
Yushi Chen
Tong Lu
Jifeng Dai
Yu Qiao
LRM
MLLM
25
83
0
03 Aug 2023
LISA: Reasoning Segmentation via Large Language Model
LISA: Reasoning Segmentation via Large Language Model
Xin Lai
Zhuotao Tian
Yukang Chen
Yanwei Li
Yuhui Yuan
Shu Liu
Jiaya Jia
LM&Ro
VLM
MLLM
LRM
14
385
0
01 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
13
116
0
25 Jul 2023
A Survey on Multimodal Large Language Models
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Bill Xu
Enhong Chen
MLLM
LRM
14
515
0
23 Jun 2023
Caption Anything: Interactive Image Description with Diverse Multimodal
  Controls
Caption Anything: Interactive Image Description with Diverse Multimodal Controls
Teng Wang
Jinrui Zhang
Junjie Fei
Hao Zheng
Yunlong Tang
Zhe Li
Mingqi Gao
Shanshan Zhao
MLLM
96
81
0
04 May 2023
Dense Distinct Query for End-to-End Object Detection
Dense Distinct Query for End-to-End Object Detection
Shilong Zhang
Wang xinjiang
Jiaqi Wang
Jiangmiao Pang
Chengqi Lyu
Wenwei Zhang
Ping Luo
Kai-xiang Chen
59
111
0
22 Mar 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
197
2,232
0
22 Mar 2023
GRiT: A Generative Region-to-text Transformer for Object Understanding
GRiT: A Generative Region-to-text Transformer for Object Understanding
Jialian Wu
Jianfeng Wang
Zhengyuan Yang
Zhe Gan
Zicheng Liu
Junsong Yuan
Lijuan Wang
ObjD
VLM
6
106
0
01 Dec 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Pix2seq: A Language Modeling Framework for Object Detection
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
223
341
0
22 Sep 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
845
0
17 Feb 2021
Feature Pyramid Networks for Object Detection
Feature Pyramid Networks for Object Detection
Tsung-Yi Lin
Piotr Dollár
Ross B. Girshick
Kaiming He
Bharath Hariharan
Serge J. Belongie
ObjD
154
3,574
0
09 Dec 2016
Previous
1234