ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.08485
  4. Cited By
Visual Instruction Tuning

Visual Instruction Tuning

17 April 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
    SyDa
    VLM
    MLLM
ArXivPDFHTML

Papers citing "Visual Instruction Tuning"

50 / 3,227 papers shown
Title
How Robust is Google's Bard to Adversarial Image Attacks?
How Robust is Google's Bard to Adversarial Image Attacks?
Yinpeng Dong
Huanran Chen
Jiawei Chen
Zhengwei Fang
X. Yang
Yichi Zhang
Yu Tian
Hang Su
Jun Zhu
AAML
26
102
0
21 Sep 2023
DreamLLM: Synergistic Multimodal Comprehension and Creation
DreamLLM: Synergistic Multimodal Comprehension and Creation
Runpei Dong
Chunrui Han
Yuang Peng
Zekun Qi
Zheng Ge
...
Hao-Ran Wei
Xiangwen Kong
Xiangyu Zhang
Kaisheng Ma
Li Yi
MLLM
34
170
0
20 Sep 2023
Kosmos-2.5: A Multimodal Literate Model
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLM
MLLM
31
63
0
20 Sep 2023
Discuss Before Moving: Visual Language Navigation via Multi-expert
  Discussions
Discuss Before Moving: Visual Language Navigation via Multi-expert Discussions
Yuxing Long
Xiaoqi Li
Wenzhe Cai
Hao Dong
LLMAG
LM&Ro
21
44
0
20 Sep 2023
StructChart: Perception, Structuring, Reasoning for Visual Chart
  Understanding
StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding
Renqiu Xia
Bo-Wen Zhang
Hao Peng
Hancheng Ye
Xiangchao Yan
Peng Ye
Botian Shi
Yu Qiao
Junchi Yan
14
0
0
20 Sep 2023
"With Great Power Comes Great Responsibility!": Student and Instructor
  Perspectives on the influence of LLMs on Undergraduate Engineering Education
"With Great Power Comes Great Responsibility!": Student and Instructor Perspectives on the influence of LLMs on Undergraduate Engineering Education
Ishika Joshi
Ritvik Budhiraja
Pranav Deepak Tanna
Lovenya Jain
Mihika Deshpande
Arjun Srivastava
Srinivas Rallapalli
Harshal D. Akolekar
Jagat Sesh Challa
Dhruv Kumar
AI4Ed
AI4CE
47
6
0
19 Sep 2023
Investigating the Catastrophic Forgetting in Multimodal Large Language
  Models
Investigating the Catastrophic Forgetting in Multimodal Large Language Models
Yuexiang Zhai
Shengbang Tong
Xiao Li
Mu Cai
Qing Qu
Yong Jae Lee
Y. Ma
VLM
MLLM
CLL
77
77
0
19 Sep 2023
Multimodal Foundation Models: From Specialists to General-Purpose
  Assistants
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Chunyuan Li
Zhe Gan
Zhengyuan Yang
Jianwei Yang
Linjie Li
Lijuan Wang
Jianfeng Gao
MLLM
115
228
0
18 Sep 2023
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
Yadong Lu
Chunyuan Li
Haotian Liu
Jianwei Yang
Jianfeng Gao
Yelong Shen
MLLM
102
31
0
18 Sep 2023
Instruction-Following Speech Recognition
Instruction-Following Speech Recognition
Cheng-I Jeff Lai
Zhiyun Lu
Liangliang Cao
Ruoming Pang
AuLLM
24
6
0
18 Sep 2023
MusiLingo: Bridging Music and Text with Pre-trained Language Models for
  Music Captioning and Query Response
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response
Zihao Deng
Yi Ma
Yudong Liu
Rongchen Guo
Ge Zhang
Wenhu Chen
Wenhao Huang
Emmanouil Benetos
MLLM
AuLLM
26
18
0
15 Sep 2023
Viewpoint Integration and Registration with Vision Language Foundation
  Model for Image Change Understanding
Viewpoint Integration and Registration with Vision Language Foundation Model for Image Change Understanding
Xiaonan Lu
Jianlong Yuan
Ruigang Niu
Yuan Hu
Fan Wang
21
1
0
15 Sep 2023
MMICL: Empowering Vision-language Model with Multi-Modal In-Context
  Learning
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
Haozhe Zhao
Zefan Cai
Shuzheng Si
Xiaojian Ma
Kaikai An
Liang Chen
Zixuan Liu
Sheng Wang
Wenjuan Han
Baobao Chang
MLLM
VLM
28
133
0
14 Sep 2023
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the
  Wild
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild
Huayang Li
Siheng Li
Deng Cai
Longyue Wang
Lemao Liu
Taro Watanabe
Yujiu Yang
Shuming Shi
MLLM
52
17
0
14 Sep 2023
SwitchGPT: Adapting Large Language Models for Non-Text Outputs
SwitchGPT: Adapting Large Language Models for Non-Text Outputs
Xinyu Wang
Bohan Zhuang
Qi Wu
MLLM
27
3
0
14 Sep 2023
Masked Diffusion with Task-awareness for Procedure Planning in
  Instructional Videos
Masked Diffusion with Task-awareness for Procedure Planning in Instructional Videos
Fen Fang
Yun Liu
Ali Koksal
Qianli Xu
Joo-Hwee Lim
VGen
DiffM
26
5
0
14 Sep 2023
Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness
  and Ethics
Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics
Haoqin Tu
Bingchen Zhao
Chen Wei
Cihang Xie
MLLM
36
14
0
13 Sep 2023
Cognitive Mirage: A Review of Hallucinations in Large Language Models
Cognitive Mirage: A Review of Hallucinations in Large Language Models
Hongbin Ye
Tong Liu
Aijia Zhang
Wei Hua
Weiqiang Jia
HILM
42
76
0
13 Sep 2023
The first step is the hardest: Pitfalls of Representing and Tokenizing
  Temporal Data for Large Language Models
The first step is the hardest: Pitfalls of Representing and Tokenizing Temporal Data for Large Language Models
Dimitris Spathis
F. Kawsar
AI4TS
21
18
0
12 Sep 2023
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Tuan Dung Nguyen
Yuan-Sen Ting
I. Ciucă
Charlie OÑeill
Ze-Chang Sun
...
Alberto Accomazzi
J. P. Naiman
Jesse Cranney
Kevin Schawinski
UniverseTBD
11
20
0
12 Sep 2023
NExT-GPT: Any-to-Any Multimodal LLM
NExT-GPT: Any-to-Any Multimodal LLM
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
MLLM
46
455
0
11 Sep 2023
MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering
  over Text, Tables and Images
MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering over Text, Tables and Images
Weihao Liu
Fangyu Lei
Tongxu Luo
Jiahe Lei
Shizhu He
Jun Zhao
Kang Liu
LMTD
24
9
0
09 Sep 2023
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual
  Tokenization
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization
Yang Jin
Kun Xu
Kun Xu
Liwei Chen
Chao Liao
...
Xiaoqiang Lei
Di Zhang
Wenwu Ou
Kun Gai
Yadong Mu
MLLM
VLM
16
41
0
09 Sep 2023
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language
  Models
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
Yangyi Chen
Karan Sikka
Michael Cogswell
Heng Ji
Ajay Divakaran
LRM
36
24
0
08 Sep 2023
ImageBind-LLM: Multi-modality Instruction Tuning
ImageBind-LLM: Multi-modality Instruction Tuning
Jiaming Han
Renrui Zhang
Wenqi Shao
Peng Gao
Peng-Tao Xu
...
Yafei Wen
Xiaoxin Chen
Xiangyu Yue
Hongsheng Li
Yu Qiao
MLLM
49
116
0
07 Sep 2023
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Zigang Geng
Binxin Yang
Tiankai Hang
Chen Li
Shuyang Gu
...
Jianmin Bao
Zheng-Wei Zhang
Han Hu
Dongdong Chen
Baining Guo
DiffM
VLM
51
93
0
07 Sep 2023
Cognitive Architectures for Language Agents
Cognitive Architectures for Language Agents
T. Sumers
Shunyu Yao
Karthik Narasimhan
Thomas L. Griffiths
LLMAG
LM&Ro
48
152
0
05 Sep 2023
CIEM: Contrastive Instruction Evaluation Method for Better Instruction
  Tuning
CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning
Hongyu Hu
Jiyuan Zhang
Minyi Zhao
Zhenbang Sun
MLLM
25
41
0
05 Sep 2023
Siren's Song in the AI Ocean: A Survey on Hallucination in Large
  Language Models
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Yue Zhang
Yafu Li
Leyang Cui
Deng Cai
Lemao Liu
...
Longyue Wang
A. Luu
Wei Bi
Freda Shi
Shuming Shi
RALM
LRM
HILM
46
520
0
03 Sep 2023
BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment
  of Continuation Writing
BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing
Chen Wang
Minpeng Liao
Zhongqiang Huang
Jinliang Lu
Junhong Wu
Yuchen Liu
Chengqing Zong
Jiajun Zhang
AuLLM
28
35
0
02 Sep 2023
TExplain: Explaining Learned Visual Features via Pre-trained (Frozen)
  Language Models
TExplain: Explaining Learned Visual Features via Pre-trained (Frozen) Language Models
Saeid Asgari Taghanaki
Aliasghar Khani
Ali Saheb Pasand
Amir Khasahmadi
Aditya Sanghi
K. Willis
Ali Mahdavi-Amiri
FAtt
VLM
22
0
0
01 Sep 2023
Large Content And Behavior Models To Understand, Simulate, And Optimize
  Content And Behavior
Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior
Ashmit Khandelwal
Aditya Agrawal
Aanisha Bhattacharyya
Yaman Kumar Singla
Somesh Singh
...
Ishita Dasgupta
Stefano Petrangeli
R. Shah
Changyou Chen
Balaji Krishnamurthy
16
8
0
01 Sep 2023
Image Hijacks: Adversarial Images can Control Generative Models at
  Runtime
Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Luke Bailey
Euan Ong
Stuart J. Russell
Scott Emmons
VLM
MLLM
16
78
0
01 Sep 2023
PointLLM: Empowering Large Language Models to Understand Point Clouds
PointLLM: Empowering Large Language Models to Understand Point Clouds
Runsen Xu
Xiaolong Wang
Tai Wang
Yilun Chen
Jiangmiao Pang
Dahua Lin
MLLM
59
149
0
31 Aug 2023
TouchStone: Evaluating Vision-Language Models by Language Models
TouchStone: Evaluating Vision-Language Models by Language Models
Shuai Bai
Shusheng Yang
Jinze Bai
Peng Wang
Xing Zhang
Junyang Lin
Xinggang Wang
Chang Zhou
Jingren Zhou
MLLM
37
44
0
31 Aug 2023
Enhancing Subtask Performance of Multi-modal Large Language Model
Enhancing Subtask Performance of Multi-modal Large Language Model
Yongqiang Zhao
Zhenyu Li
Feng Zhang
Xinhai Xu
Donghong Liu
LRM
19
0
0
31 Aug 2023
Sparkles: Unlocking Chats Across Multiple Images for Multimodal
  Instruction-Following Models
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Yupan Huang
Zaiqiao Meng
Fangyu Liu
Yixuan Su
Nigel Collier
Yutong Lu
MLLM
38
22
0
31 Aug 2023
AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language
  Models
AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models
Zhaopeng Gu
Bingke Zhu
Guibo Zhu
Yingying Chen
Ming Tang
Jinqiao Wang
VLM
MLLM
25
100
0
29 Aug 2023
Evaluation and Analysis of Hallucination in Large Vision-Language Models
Evaluation and Analysis of Hallucination in Large Vision-Language Models
Junyan Wang
Yi Zhou
Guohai Xu
Pengcheng Shi
Chenlin Zhao
...
Mingshi Yan
Ji Zhang
Jihua Zhu
Jitao Sang
Haoyu Tang
MLLM
21
66
0
29 Aug 2023
CoVR: Learning Composed Video Retrieval from Web Video Captions
CoVR: Learning Composed Video Retrieval from Web Video Captions
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
22
26
0
28 Aug 2023
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on
  Language, Multimodal, and Scientific GPT Models
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on Language, Multimodal, and Scientific GPT Models
Kaiyuan Gao
Su He
Zhenyu He
Jiacheng Lin
Qizhi Pei
Jie Shao
Wei Zhang
LM&MA
SyDa
30
4
0
27 Aug 2023
Position-Enhanced Visual Instruction Tuning for Multimodal Large
  Language Models
Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
Chi Chen
Ruoyu Qin
Fuwen Luo
Xiaoyue Mi
Peng Li
Maosong Sun
Yang Liu
MLLM
VLM
14
45
0
25 Aug 2023
MLLM-DataEngine: An Iterative Refinement Approach for MLLM
MLLM-DataEngine: An Iterative Refinement Approach for MLLM
Zhiyuan Zhao
Linke Ouyang
Bin Wang
Siyuan Huang
Pan Zhang
Xiao-wen Dong
Jiaqi Wang
Conghui He
MLLM
26
5
0
25 Aug 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding,
  Localization, Text Reading, and Beyond
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLM
VLM
ObjD
50
793
0
24 Aug 2023
DLIP: Distilling Language-Image Pre-training
DLIP: Distilling Language-Image Pre-training
Huafeng Kuang
Jie Wu
Xiawu Zheng
Ming Li
Xuefeng Xiao
Rui Wang
Min Zheng
Rongrong Ji
VLM
36
4
0
24 Aug 2023
VIGC: Visual Instruction Generation and Correction
VIGC: Visual Instruction Generation and Correction
Bin Wang
Fan Wu
Xiao Han
Jiahui Peng
Huaping Zhong
...
Xiao-wen Dong
Weijia Li
Wei Li
Jiaqi Wang
Conghui He
MLLM
27
62
0
24 Aug 2023
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4
Lai Wei
Zihao Jiang
Weiran Huang
Lichao Sun
VLM
MLLM
24
56
0
23 Aug 2023
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across
  Languages
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Jinyi Hu
Yuan Yao
Chong Wang
Shanonan Wang
Yinxu Pan
...
Yankai Lin
Jiao Xue
Dahai Li
Zhiyuan Liu
Maosong Sun
MLLM
VLM
29
48
0
23 Aug 2023
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Jiasheng Ye
Zaixiang Zheng
Yu Bao
Lihua Qian
Quanquan Gu
DiffM
54
14
0
23 Aug 2023
ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts
ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts
Bilel Benjdira
Anis Koubaa
Anas M. Ali
LM&Ro
22
3
0
22 Aug 2023
Previous
123...606162636465
Next