ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.08485
  4. Cited By
Visual Instruction Tuning

Visual Instruction Tuning

17 April 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
    SyDa
    VLM
    MLLM
ArXivPDFHTML

Papers citing "Visual Instruction Tuning"

50 / 3,277 papers shown
Title
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision,
  Language, Audio, and Action
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
40
147
0
28 Dec 2023
FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models
FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models
Wan Xu
Tianyu Huang
Tianyu Qu
Guanglei Yang
Yiwen Guo
Wangmeng Zuo
26
0
0
28 Dec 2023
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile
  Devices
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Xiangxiang Chu
Limeng Qiao
Xinyang Lin
Shuang Xu
Yang Yang
...
Fei Wei
Xinyu Zhang
Bo Zhang
Xiaolin Wei
Chunhua Shen
MLLM
44
35
0
28 Dec 2023
Visual Instruction Tuning towards General-Purpose Multimodal Model: A
  Survey
Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey
Jiaxing Huang
Jingyi Zhang
Kai Jiang
Han Qiu
Shijian Lu
44
22
0
27 Dec 2023
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with
  Time-Decoupled Training and Reusable Coop-Diffusion
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion
Guansong Lu
Yuanfan Guo
Jianhua Han
Minzhe Niu
Yihan Zeng
Songcen Xu
Zeyi Huang
Zhao Zhong
Wei Zhang
Hang Xu
39
4
0
27 Dec 2023
LLM-SAP: Large Language Models Situational Awareness Based Planning
LLM-SAP: Large Language Models Situational Awareness Based Planning
Liman Wang
Hanyang Zhong
LLMAG
35
2
0
26 Dec 2023
ChartBench: A Benchmark for Complex Visual Reasoning in Charts
ChartBench: A Benchmark for Complex Visual Reasoning in Charts
Zhengzhuo Xu
Sinan Du
Yiyan Qi
Chengjin Xu
Chun Yuan
Jian Guo
45
36
0
26 Dec 2023
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric
  Robotic Manipulation
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Xiaoqi Li
Mingxu Zhang
Yiran Geng
Haoran Geng
Yuxing Long
Yan Shen
Renrui Zhang
Jiaming Liu
Hao Dong
LM&Ro
LRM
53
82
0
24 Dec 2023
Voila-A: Aligning Vision-Language Models with User's Gaze Attention
Voila-A: Aligning Vision-Language Models with User's Gaze Attention
Kun Yan
Lei Ji
Zeyu Wang
Yuntao Wang
Nan Duan
Shuai Ma
63
8
0
22 Dec 2023
FoodLMM: A Versatile Food Assistant using Large Multi-modal Model
FoodLMM: A Versatile Food Assistant using Large Multi-modal Model
Yuehao Yin
Huiyan Qi
B. Zhu
Jingjing Chen
Yu-Gang Jiang
Chong-Wah Ngo
31
19
0
22 Dec 2023
MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning
MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning
Liang Peng
Songyue Cai
Zongqian Wu
Huifang Shang
Xiaofeng Zhu
Xiaoxiao Li
47
9
0
22 Dec 2023
MetaAID 2.5: A Secure Framework for Developing Metaverse Applications
  via Large Language Models
MetaAID 2.5: A Secure Framework for Developing Metaverse Applications via Large Language Models
Hongyin Zhu
41
6
0
22 Dec 2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
176
972
0
21 Dec 2023
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Penghao Wu
Saining Xie
LRM
64
124
0
21 Dec 2023
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Jitesh Jain
Jianwei Yang
Humphrey Shi
MLLM
29
24
0
21 Dec 2023
Generative Multimodal Models are In-Context Learners
Generative Multimodal Models are In-Context Learners
Quan-Sen Sun
Yufeng Cui
Xiaosong Zhang
Fan Zhang
Qiying Yu
...
Yueze Wang
Yongming Rao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
LRM
45
248
0
20 Dec 2023
Exploring Multimodal Large Language Models for Radiology Report
  Error-checking
Exploring Multimodal Large Language Models for Radiology Report Error-checking
Jinge Wu
Yunsoo Kim
Eva C. Keller
Jamie Chow
Adam P. Levine
Nikolas Pontikos
Zina M. Ibrahim
Paul Taylor
Michelle C. Williams
Honghan Wu
LM&MA
22
3
0
20 Dec 2023
Jack of All Tasks, Master of Many: Designing General-purpose
  Coarse-to-Fine Vision-Language Model
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLM
MLLM
50
29
0
19 Dec 2023
Mixture of Cluster-conditional LoRA Experts for Vision-language
  Instruction Tuning
Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning
Yunhao Gou
Zhili Liu
Kai Chen
Lanqing Hong
Hang Xu
Aoxue Li
Dit-Yan Yeung
James T. Kwok
Yu Zhang
MoE
MLLM
VLM
49
63
0
19 Dec 2023
VQA4CIR: Boosting Composed Image Retrieval with Visual Question
  Answering
VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering
Chun-Mei Feng
Yang Bai
Yaoyu Zhang
Zhen Li
Salman Khan
Wangmeng Zuo
Xinxing Xu
Rick Siow Mong Goh
Yong-Jin Liu
37
5
0
19 Dec 2023
Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion
Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion
Fan Zhang
Shaodi You
Yu Li
Ying Fu
MDE
51
18
0
19 Dec 2023
HAAR: Text-Conditioned Generative Model of 3D Strand-based Human
  Hairstyles
HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles
V. Sklyarova
Egor Zakharov
Otmar Hilliges
Michael J. Black
Justus Thies
3DH
32
13
0
18 Dec 2023
VolumeDiffusion: Flexible Text-to-3D Generation with Efficient
  Volumetric Encoder
VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder
Zhicong Tang
Shuyang Gu
Chunyu Wang
Ting Zhang
Jianmin Bao
DongDong Chen
Baining Guo
DiffM
40
23
0
18 Dec 2023
Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM
  Finetuning
Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning
Bingchen Zhao
Haoqin Tu
Chen Wei
Jieru Mei
Cihang Xie
28
33
0
18 Dec 2023
SPIRE: Semantic Prompt-Driven Image Restoration
SPIRE: Semantic Prompt-Driven Image Restoration
Chenyang Qi
Zhengzhong Tu
Keren Ye
M. Delbracio
P. Milanfar
Qifeng Chen
Hossein Talebi
DiffM
38
11
0
18 Dec 2023
The Good, The Bad, and Why: Unveiling Emotions in Generative AI
The Good, The Bad, and Why: Unveiling Emotions in Generative AI
Cheng-rong Li
Jindong Wang
Yixuan Zhang
Kaijie Zhu
Xinyi Wang
Wenxin Hou
Jianxun Lian
Fang Luo
Qiang Yang
Xing Xie
LLMAG
26
14
0
18 Dec 2023
CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update
CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update
Zhi Gao
Yuntao Du
Xintong Zhang
Xiaojian Ma
Wenjuan Han
Song-Chun Zhu
Qing Li
LLMAG
VLM
36
22
0
18 Dec 2023
MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual
  Storytelling via Multi-Layered Semantic-Aware Denoising
MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoising
Bingyuan Wang
Hengyu Meng
Zeyu Cai
Lanjiong Li
Yue Ma
Qifeng Chen
Zeyu Wang
DiffM
37
3
0
18 Dec 2023
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Mingsheng Li
Xin Chen
C. Zhang
Sijin Chen
Erik Cambria
Fukun Yin
Gang Yu
Tao Chen
41
24
0
17 Dec 2023
Silkie: Preference Distillation for Large Visual Language Models
Silkie: Preference Distillation for Large Visual Language Models
Lei Li
Zhihui Xie
Mukai Li
Shunian Chen
Peiyi Wang
Liang Chen
Yazheng Yang
Benyou Wang
Lingpeng Kong
MLLM
117
69
0
17 Dec 2023
StarVector: Generating Scalable Vector Graphics Code from Images
StarVector: Generating Scalable Vector Graphics Code from Images
Juan A. Rodriguez
Shubham Agarwal
I. Laradji
Pau Rodríguez
David Vazquez
Christopher Pal
M. Pedersoli
51
6
0
17 Dec 2023
M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge Base
M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge Base
Zhiwei Zha
Jiaan Wang
Zhixu Li
Xiangru Zhu
Wei Song
Yanghua Xiao
VLM
45
2
0
16 Dec 2023
One-Shot Learning as Instruction Data Prospector for Large Language
  Models
One-Shot Learning as Instruction Data Prospector for Large Language Models
Yunshui Li
Binyuan Hui
Xiaobo Xia
Jiaxi Yang
Min Yang
...
Ling-Hao Chen
Junhao Liu
Tongliang Liu
Fei Huang
Yongbin Li
38
32
0
16 Dec 2023
Data-Efficient Multimodal Fusion on a Single GPU
Data-Efficient Multimodal Fusion on a Single GPU
Noël Vouitsis
Zhaoyan Liu
S. Gorti
Valentin Villecroze
Jesse C. Cresswell
Guangwei Yu
G. Loaiza-Ganem
M. Volkovs
51
3
0
15 Dec 2023
Osprey: Pixel Understanding with Visual Instruction Tuning
Osprey: Pixel Understanding with Visual Instruction Tuning
Yuqian Yuan
Wentong Li
Jian Liu
Dongqi Tang
Xinjie Luo
Chi Qin
Lei Zhang
Jianke Zhu
MLLM
VLM
56
78
0
15 Dec 2023
Prompting Large Language Models for Topic Modeling
Prompting Large Language Models for Topic Modeling
Han Wang
Nirmalendu Prakash
N. Hoang
Ming Shan Hee
Usman Naseem
Roy Ka-wei Lee
38
25
0
15 Dec 2023
Lever LM: Configuring In-Context Sequence to Lever Large Vision Language
  Models
Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models
Xu Yang
Yingzhe Peng
Haoxuan Ma
Shuo Xu
Chi Zhang
Yucheng Han
Hanwang Zhang
37
5
0
15 Dec 2023
GSVA: Generalized Segmentation via Multimodal Large Language Models
GSVA: Generalized Segmentation via Multimodal Large Language Models
Zhuofan Xia
Dongchen Han
Yizeng Han
Xuran Pan
Shiji Song
Gao Huang
VLM
53
56
0
15 Dec 2023
VL-GPT: A Generative Pre-trained Transformer for Vision and Language
  Understanding and Generation
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
Jinguo Zhu
Xiaohan Ding
Yixiao Ge
Yuying Ge
Sijie Zhao
Hengshuang Zhao
Xiaohua Wang
Ying Shan
ViT
VLM
24
33
0
14 Dec 2023
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral
  Planning States for Autonomous Driving
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Wenhai Wang
Jiangwei Xie
ChuanYang Hu
Haoming Zou
Jianan Fan
...
Lewei Lu
Xizhou Zhu
Xiaogang Wang
Yu Qiao
Jifeng Dai
36
127
0
14 Dec 2023
Pixel Aligned Language Models
Pixel Aligned Language Models
Jiarui Xu
Xingyi Zhou
Shen Yan
Xiuye Gu
Anurag Arnab
Chen Sun
Xiaolong Wang
Cordelia Schmid
MLLM
VLM
45
15
0
14 Dec 2023
General Object Foundation Model for Images and Videos at Scale
General Object Foundation Model for Images and Videos at Scale
Junfeng Wu
Yi-Xin Jiang
Qihao Liu
Zehuan Yuan
Xiang Bai
Song Bai
VOS
VLM
38
39
0
14 Dec 2023
Depicting Beyond Scores: Advancing Image Quality Assessment through
  Multi-modal Language Models
Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models
Zhiyuan You
Zheyuan Li
Jinjin Gu
Zhenfei Yin
Tianfan Xue
Chao Dong
EGVM
29
35
0
14 Dec 2023
Training-free Zero-shot Composed Image Retrieval with Local Concept
  Reranking
Training-free Zero-shot Composed Image Retrieval with Local Concept Reranking
Shitong Sun
Fanghua Ye
Shaogang Gong
34
13
0
14 Dec 2023
See, Say, and Segment: Teaching LMMs to Overcome False Premises
See, Say, and Segment: Teaching LMMs to Overcome False Premises
Tsung-Han Wu
Giscard Biamby
David M. Chan
Lisa Dunlap
Ritwik Gupta
Xudong Wang
Joseph E. Gonzalez
Trevor Darrell
VLM
MLLM
44
18
0
13 Dec 2023
Chat-3D v2: Bridging 3D Scene and Large Language Models with Object
  Identifiers
Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers
Haifeng Huang
Zehan Wang
Rongjie Huang
Luping Liu
Xize Cheng
Yang Zhao
Tao Jin
Zhou Zhao
61
46
0
13 Dec 2023
Assessing GPT4-V on Structured Reasoning Tasks
Assessing GPT4-V on Structured Reasoning Tasks
Mukul Singh
J. Cambronero
Sumit Gulwani
Vu Le
Gust Verbruggen
LRM
43
10
0
13 Dec 2023
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
Xinpeng Wang
Xiaoyuan Yi
Han Jiang
Shanlin Zhou
Zhihua Wei
Xing Xie
38
13
0
13 Dec 2023
A Foundational Multimodal Vision Language AI Assistant for Human
  Pathology
A Foundational Multimodal Vision Language AI Assistant for Human Pathology
Ming Y. Lu
Bowen Chen
Drew F. K. Williamson
Richard J. Chen
Kenji Ikamura
...
Ivy Liang
L. Le
Tong Ding
Anil V. Parwani
Faisal Mahmood
MedIm
LM&MA
26
20
0
13 Dec 2023
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
Hao Shao
Yuxuan Hu
Letian Wang
Steven L. Waslander
Yu Liu
Hongsheng Li
ELM
38
113
0
12 Dec 2023
Previous
123...545556...646566
Next