Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.08485
Cited By
Visual Instruction Tuning
17 April 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Instruction Tuning"
50 / 3,223 papers shown
Title
Contextual Object Detection with Multimodal Large Language Models
Yuhang Zang
Wei Li
Jun Han
Kaiyang Zhou
Chen Change Loy
ObjD
VLM
MLLM
30
78
0
29 May 2023
GlyphControl: Glyph Conditional Control for Visual Text Generation
Yukang Yang
Dongnan Gui
Yuhui Yuan
Weicong Liang
Haisong Ding
Hang-Rui Hu
Kai Chen
DiffM
27
77
0
29 May 2023
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
Dachuan Shi
Chaofan Tao
Anyi Rao
Zhendong Yang
Chun Yuan
Jiaqi Wang
VLM
30
22
0
27 May 2023
Generating Images with Multimodal Language Models
Jing Yu Koh
Daniel Fried
Ruslan Salakhutdinov
MLLM
28
241
0
26 May 2023
On Evaluating Adversarial Robustness of Large Vision-Language Models
Yunqing Zhao
Tianyu Pang
Chao Du
Xiao Yang
Chongxuan Li
Ngai-man Cheung
Min-Bin Lin
VLM
AAML
MLLM
19
166
0
26 May 2023
ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Zijia Zhao
Longteng Guo
Tongtian Yue
Si-Qing Chen
Shuai Shao
Xinxin Zhu
Zehuan Yuan
Jing Liu
MLLM
32
52
0
25 May 2023
Towards Language-guided Interactive 3D Generation: LLMs as Layout Interpreter with Generative Feedback
Yiqi Lin
Hao Wu
Ruichen Wang
H. Lu
Xiaodong Lin
Hui Xiong
Lin Wang
3DV
42
12
0
25 May 2023
The False Promise of Imitating Proprietary LLMs
Arnav Gudibande
Eric Wallace
Charles Burton Snell
Xinyang Geng
Hao Liu
Pieter Abbeel
Sergey Levine
Dawn Song
ALM
41
196
0
25 May 2023
PandaGPT: One Model To Instruction-Follow Them All
Yixuan Su
Tian Lan
Huayang Li
Jialu Xu
Yan Wang
Deng Cai
MLLM
34
274
0
25 May 2023
Rethinking the Evaluation Protocol of Domain Generalization
Han Yu
Xingxuan Zhang
Renzhe Xu
Jiashuo Liu
Yue He
Peng Cui
OOD
24
7
0
24 May 2023
Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models
Geewook Kim
Hodong Lee
D. Kim
Haeji Jung
S. Park
Yoon Kim
Sangdoo Yun
Taeho Kil
Bado Lee
Seunghyun Park
VLM
35
4
0
24 May 2023
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
Gen Luo
Yiyi Zhou
Tianhe Ren
Shen Chen
Xiaoshuai Sun
Rongrong Ji
VLM
MLLM
26
89
0
24 May 2023
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
Yao Mu
Qinglong Zhang
Mengkang Hu
Wen Wang
Mingyu Ding
Jun Jin
Bin Wang
Jifeng Dai
Yu Qiao
Ping Luo
LM&Ro
LRM
23
219
0
24 May 2023
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
Sheng Shen
Le Hou
Yan-Quan Zhou
Nan Du
Shayne Longpre
...
Vincent Zhao
Hongkun Yu
Kurt Keutzer
Trevor Darrell
Denny Zhou
ALM
MoE
30
54
0
24 May 2023
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Haoxuan You
Rui Sun
Zhecan Wang
Long Chen
Gengyu Wang
Hammad A. Ayyubi
Kai-Wei Chang
Shih-Fu Chang
VLM
MLLM
LRM
44
43
0
24 May 2023
CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models
Cheng Qian
Chi Han
Yi Ren Fung
Yujia Qin
Zhiyuan Liu
Heng Ji
LRM
18
30
0
23 May 2023
ReSee: Responding through Seeing Fine-grained Visual Knowledge in Open-domain Dialogue
Haoqin Tu
Yitong Li
Fei Mi
Zhongliang Yang
35
4
0
23 May 2023
Training Diffusion Models with Reinforcement Learning
Kevin Black
Michael Janner
Yilun Du
Ilya Kostrikov
Sergey Levine
EGVM
44
316
0
22 May 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
45
539
0
22 May 2023
TheoremQA: A Theorem-driven Question Answering dataset
Wenhu Chen
Ming Yin
Max W.F. Ku
Pan Lu
Yixin Wan
Xueguang Ma
Jianyu Xu
Xinyi Wang
Tony Xia
AIMat
35
117
0
21 May 2023
What Makes for Good Visual Tokenizers for Large Language Models?
Guangzhi Wang
Yixiao Ge
Xiaohan Ding
Mohan S. Kankanhalli
Ying Shan
MLLM
VLM
25
38
0
20 May 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Hiroki Furuta
Kuang-Huei Lee
Ofir Nachum
Yutaka Matsuo
Aleksandra Faust
S. Gu
Izzeddin Gur
LM&Ro
36
91
0
19 May 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Wen Wang
Zhe Chen
Xiaokang Chen
Jiannan Wu
Xizhou Zhu
...
Ping Luo
Tong Lu
Jie Zhou
Yu Qiao
Jifeng Dai
MLLM
VLM
33
455
0
18 May 2023
Going Denser with Open-Vocabulary Part Segmentation
Pei Sun
Shoufa Chen
Chenchen Zhu
Fanyi Xiao
Ping Luo
Saining Xie
Zhicheng Yan
ObjD
VLM
20
45
0
18 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
31
114
0
18 May 2023
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Youwei Liang
Ruiyi Zhang
Li Zhang
Pengtao Xie
LM&MA
GNN
16
48
0
18 May 2023
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
Dong Zhang
Shimin Li
Xin Zhang
Jun Zhan
Pengyu Wang
Yaqian Zhou
Xipeng Qiu
AuLLM
MLLM
48
290
0
18 May 2023
Listen, Think, and Understand
Yuan Gong
Hongyin Luo
Alexander H. Liu
Leonid Karlinsky
James R. Glass
ELM
MLLM
LRM
35
136
0
18 May 2023
Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners
Xuehai He
Weixi Feng
Tsu-jui Fu
Varun Jampani
Arjun Reddy Akula
P. Narayana
Sugato Basu
William Yang Wang
X. Wang
DiffM
39
7
0
18 May 2023
Evaluating Object Hallucination in Large Vision-Language Models
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLM
LRM
93
693
0
17 May 2023
On the Hidden Mystery of OCR in Large Multimodal Models
Yuliang Liu
Zhang Li
Mingxin Huang
Chunyuan Li
Dezhi Peng
Mingyu Liu
Lianwen Jin
Xiang Bai
VLM
MLLM
28
51
0
13 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLM
VLM
19
1,903
0
11 May 2023
VideoChat: Chat-Centric Video Understanding
Kunchang Li
Yinan He
Yi Wang
Yizhuo Li
Wen Wang
Ping Luo
Yali Wang
Limin Wang
Yu Qiao
MLLM
37
529
0
10 May 2023
Vision-Language Models in Remote Sensing: Current Progress and Future Trends
Xiang Li
Congcong Wen
Yuan Hu
Zhenghang Yuan
Xiao Xiang Zhu
VLM
18
71
0
09 May 2023
InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language
Zhaoyang Liu
Yinan He
Wenhai Wang
Weiyun Wang
Yi Wang
...
Yali Wang
Limin Wang
Ping Luo
Jifeng Dai
Yu Qiao
LRM
MLLM
22
79
0
09 May 2023
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans
T. Gong
Chengqi Lyu
Shilong Zhang
Yudong Wang
Miao Zheng
Qianmengke Zhao
Kuikun Liu
Wenwei Zhang
Ping Luo
Kai-xiang Chen
MLLM
34
252
0
08 May 2023
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Feilong Chen
Minglun Han
Haozhi Zhao
Qingyang Zhang
Jing Shi
Shuang Xu
Bo Xu
MLLM
36
115
0
07 May 2023
Otter: A Multi-Modal Model with In-Context Instruction Tuning
Bo-wen Li
Yuanhan Zhang
Liangyu Chen
Jinghao Wang
Jingkang Yang
Ziwei Liu
MLLM
39
504
0
05 May 2023
LMEye: An Interactive Perception Network for Large Language Models
Yunxin Li
Baotian Hu
Xinyu Chen
Lin Ma
Yong-mei Xu
M. Zhang
MLLM
VLM
25
24
0
05 May 2023
T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Mixed Large Language Model Signals for Science Question Answering
Lei Wang
Yilang Hu
Jiabang He
Xingdong Xu
Ning Liu
Hui-juan Liu
Hengtao Shen
LRM
MLLM
23
41
0
05 May 2023
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Zhiqing Sun
Yikang Shen
Qinhong Zhou
Hongxin Zhang
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
SyDa
ALM
25
313
0
04 May 2023
Visual Transformation Telling
Wanqing Cui
Mustafa Nasir-Moin
Yanyan Lan
Viola J. Chen
J. Guo
Xueqi Cheng
LRM
59
1
0
03 May 2023
Instruction-ViT: Multi-Modal Prompts for Instruction Learning in ViT
Zhe Xiao
Yuzhong Chen
Lu Zhang
Jun Yao
Zihao Wu
...
Yixuan Yuan
Dinggang Shen
Dajiang Zhu
Tianming Liu
Xi Jiang
VLM
MLLM
60
17
0
29 Apr 2023
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Peng Gao
Jiaming Han
Renrui Zhang
Ziyi Lin
Shijie Geng
...
Pan Lu
Conghui He
Xiangyu Yue
Hongsheng Li
Yu Qiao
MLLM
33
550
0
28 Apr 2023
WizardLM: Empowering Large Language Models to Follow Complex Instructions
Can Xu
Qingfeng Sun
Kai Zheng
Xiubo Geng
Pu Zhao
Jiazhan Feng
Chongyang Tao
Daxin Jiang
ALM
29
903
0
24 Apr 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLM
MLLM
41
1,897
0
20 Apr 2023
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
Pan Lu
Baolin Peng
Hao Cheng
Michel Galley
Kai-Wei Chang
Ying Nian Wu
Song-Chun Zhu
Jianfeng Gao
KELM
MLLM
LRM
42
301
0
19 Apr 2023
Deep Unrestricted Document Image Rectification
Hao Feng
Shaokai Liu
Jiajun Deng
Wen-gang Zhou
Houqiang Li
ViT
21
13
0
18 Apr 2023
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
159
579
0
06 Apr 2023
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Renrui Zhang
Jiaming Han
Chris Liu
Peng Gao
Aojun Zhou
Xiangfei Hu
Shilin Yan
Pan Lu
Hongsheng Li
Yu Qiao
MLLM
35
741
0
28 Mar 2023
Previous
1
2
3
...
63
64
65
Next