Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.08485
Cited By
Visual Instruction Tuning
17 April 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Instruction Tuning"
50 / 3,223 papers shown
Title
ViT-Lens: Initiating Omni-Modal Exploration through 3D Insights
Weixian Lei
Yixiao Ge
Jianfeng Zhang
Dylan Sun
Kun Yi
Ying Shan
Mike Zheng Shou
30
1
0
20 Aug 2023
Tackling Vision Language Tasks Through Learning Inner Monologues
Diji Yang
Kezhen Chen
Jinmeng Rao
Xiaoyuan Guo
Yawen Zhang
Jie Yang
Y. Zhang
MLLM
29
9
0
19 Aug 2023
PUMGPT: A Large Vision-Language Model for Product Understanding
Wei Xue
Zongyi Guo
Baoliang Cui
Zengming Tang
Weiwei Zhang
Haihong Tang
Shuhui Wu
Weiming Lu
VLM
30
2
0
18 Aug 2023
Artificial-Spiking Hierarchical Networks for Vision-Language Representation Learning
Ye-Ting Chen
Siyu Zhang
Yaoru Sun
Weijian Liang
Haoran Wang
35
0
0
18 Aug 2023
FashionLOGO: Prompting Multimodal Large Language Models for Fashion Logo Embeddings
Yulin Su
Min Yang
Minghui Qiu
Jing Wang
Tao Wang
VLM
25
0
0
17 Aug 2023
Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes
Zehan Wang
Haifeng Huang
Yang Zhao
Ziang Zhang
Zhou Zhao
19
59
0
17 Aug 2023
LLM4TS: Aligning Pre-Trained LLMs as Data-Efficient Time-Series Forecasters
Ching Chang
Wei-Yao Wang
Wenjie Peng
Tien-Fu Chen
AI4TS
38
45
0
16 Aug 2023
High-Fidelity Lake Extraction via Two-Stage Prompt Enhancement: Establishing a Novel Baseline and Benchmark
Ben Chen
Xuechao Zou
K. Li
Yu-an Zhang
Junliang Xing
Pin Tao
23
1
0
16 Aug 2023
Link-Context Learning for Multimodal LLMs
Yan Tai
Weichen Fan
Zhao Zhang
Feng Zhu
Rui Zhao
Ziwei Liu
ReLM
LRM
21
17
0
15 Aug 2023
OctoPack: Instruction Tuning Code Large Language Models
Niklas Muennighoff
Qian Liu
A. Zebaze
Qinkai Zheng
Binyuan Hui
Terry Yue Zhuo
Swayam Singh
Xiangru Tang
Leandro von Werra
Shayne Longpre
VLM
ALM
65
117
0
14 Aug 2023
GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text
Peng Liu
Yiming Ren
Jun Tao
Zhixiang Ren
AI4CE
25
79
0
14 Aug 2023
Detecting and Preventing Hallucinations in Large Vision Language Models
Anisha Gunjal
Jihan Yin
Erhan Bas
MLLM
VLM
19
154
0
11 Aug 2023
Thinking Like an Expert:Multimodal Hypergraph-of-Thought (HoT) Reasoning to boost Foundation Modals
Fanglong Yao
Changyuan Tian
Jintao Liu
Zequn Zhang
Qing Liu
Li Jin
Shuchao Li
Xiaoyu Li
Xian Sun
ReLM
LRM
14
15
0
11 Aug 2023
Encode-Store-Retrieve: Enhancing Memory Augmentation through Language-Encoded Egocentric Perception
Junxiao Shen
John J. Dudley
Per Ola Kristensson
RALM
20
0
0
10 Aug 2023
Fine-Tune Language Models as Multi-Modal Differential Equation Solvers
Liu Yang
Siting Liu
Stanley J. Osher
21
0
0
09 Aug 2023
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
Juncheng Li
Kaihang Pan
Zhiqi Ge
Minghe Gao
Wei Ji
Wenqiao Zhang
Tat-Seng Chua
Siliang Tang
Hanwang Zhang
Yueting Zhuang
MLLM
27
68
0
08 Aug 2023
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Dongyang Yu
Shihao Wang
Yuan Fang
Wangpeng An
VGen
33
0
0
08 Aug 2023
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
Wenqi Shao
Yutao Hu
Peng Gao
Meng Lei
Kaipeng Zhang
...
Peng-Tao Xu
Siyuan Huang
Hongsheng Li
Yuning Qiao
Ping Luo
VLM
MLLM
32
2
0
07 Aug 2023
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
Weihao Yu
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
Zicheng Liu
Xinchao Wang
Lijuan Wang
MLLM
43
605
0
04 Aug 2023
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Weiyun Wang
Min Shi
Qingyun Li
Wen Wang
Zhenhang Huang
...
Zhiguo Cao
Yushi Chen
Tong Lu
Jifeng Dai
Yu Qiao
LRM
MLLM
35
84
0
03 Aug 2023
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Anas Awadalla
Irena Gao
Josh Gardner
Jack Hessel
Yusuf Hanafy
...
Simon Kornblith
Pang Wei Koh
Gabriel Ilharco
Mitchell Wortsman
Ludwig Schmidt
MLLM
33
401
0
02 Aug 2023
Revisiting DETR Pre-training for Object Detection
Yan Ma
Weicong Liang
Bo-Ying Chen
Yiduo Hao
Bojian Hou
Xiangyu Yue
Chao Zhang
Yuhui Yuan
VLM
ViT
35
4
0
02 Aug 2023
LISA: Reasoning Segmentation via Large Language Model
Xin Lai
Zhuotao Tian
Yukang Chen
Yanwei Li
Yuhui Yuan
Shu Liu
Jiaya Jia
LM&Ro
VLM
MLLM
LRM
31
392
0
01 Aug 2023
Learning to Model the World with Language
Jessy Lin
Yuqing Du
Olivia Watkins
Danijar Hafner
Pieter Abbeel
Dan Klein
Anca Dragan
LM&Ro
SyDa
26
50
0
31 Jul 2023
FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis
Ziao Wang
Yuhang Li
Junda Wu
Jaehyeon Soon
Xiaofeng Zhang
MLLM
17
15
0
31 Jul 2023
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Enxin Song
Wenhao Chai
Guanhong Wang
Yucheng Zhang
Haoyang Zhou
...
Tianbo Ye
Yanting Zhang
Yang Lu
Jenq-Neng Hwang
Gaoang Wang
VLM
MLLM
22
260
0
31 Jul 2023
Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks
Kousik Rajesh
Mrigank Raman
M. A. Karim
Pranit Chawla
VLM
23
2
0
31 Jul 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
Mustafa Shukor
Corentin Dancette
Alexandre Ramé
Matthieu Cord
MoMe
MLLM
61
42
0
30 Jul 2023
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
Bohao Li
Rui Wang
Guangzhi Wang
Yuying Ge
Yixiao Ge
Ying Shan
MLLM
ELM
30
500
0
30 Jul 2023
CFN-ESA: A Cross-Modal Fusion Network with Emotion-Shift Awareness for Dialogue Emotion Recognition
Jiang Li
Xiaoping Wang
Yingjian Liu
Zhigang Zeng
24
16
0
28 Jul 2023
RSGPT: A Remote Sensing Vision Language Model and Benchmark
Yuan Hu
Jianlong Yuan
Congcong Wen
Xiaonan Lu
Xiang Li
VLM
18
99
0
28 Jul 2023
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
Erfan Shayegani
Yue Dong
Nael B. Abu-Ghazaleh
30
127
0
26 Jul 2023
WavJourney: Compositional Audio Creation with Large Language Models
Xubo Liu
Zhongkai Zhu
Haohe Liu
Yiitan Yuan
Meng Cui
...
Jinhua Liang
Yin Cao
Qiuqiang Kong
Mark D. Plumbley
Wenwu Wang
AuLLM
21
25
0
26 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
32
118
0
25 Jul 2023
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework
Jingxuan Wei
Cheng Tan
Zhangyang Gao
Linzhuang Sun
Siyuan Li
Bihui Yu
R. Guo
Stan Z. Li
LRM
29
8
0
24 Jul 2023
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Xiaoxuan Wang
Ziniu Hu
Pan Lu
Yanqiao Zhu
Jieyu Zhang
Satyen Subramaniam
Arjun R. Loomba
Shichang Zhang
Yizhou Sun
Wei Wang
ELM
LRM
23
85
0
20 Jul 2023
Abusing Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs
Eugene Bagdasaryan
Tsung-Yin Hsieh
Ben Nassi
Vitaly Shmatikov
18
79
0
19 Jul 2023
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning
Liang Zhao
En Yu
Zheng Ge
Jinrong Yang
Hao-Ran Wei
...
Jian‐Yuan Sun
Yuang Peng
Runpei Dong
Chunrui Han
Xiangyu Zhang
MLLM
LRM
31
53
0
18 Jul 2023
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Yang Zhao
Zhijie Lin
Daquan Zhou
Zilong Huang
Jiashi Feng
Bingyi Kang
MLLM
33
106
0
17 Jul 2023
Planting a SEED of Vision in Large Language Model
Yuying Ge
Yixiao Ge
Ziyun Zeng
Xintao Wang
Ying Shan
VLM
MLLM
8
90
0
16 Jul 2023
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Yiren Jian
Chongyang Gao
Soroush Vosoughi
VLM
MLLM
27
25
0
13 Jul 2023
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
Gregor Geigle
Abhay Jain
Radu Timofte
Goran Glavavs
VLM
MLLM
18
29
0
13 Jul 2023
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Saeed Mian
OffRL
64
523
0
12 Jul 2023
MMBench: Is Your Multi-modal Model an All-around Player?
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Bo-wen Li
Songyang Zhang
...
Jiaqi Wang
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
27
903
0
12 Jul 2023
Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning
Gengyuan Zhang
Yurui Zhang
Kerui Zhang
Volker Tresp
LRM
24
10
0
12 Jul 2023
One-Versus-Others Attention: Scalable Multimodal Integration for Clinical Data
Michal Golovanevsky
Eva Schiller
Akira Nair
Ritambhara Singh
Carsten Eickhoff
16
2
0
11 Jul 2023
Emu: Generative Pretraining in Multimodality
Quan-Sen Sun
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Yueze Wang
Hongcheng Gao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
34
126
0
11 Jul 2023
Linear Alignment of Vision-language Models for Image Captioning
Fabian Paischer
M. Hofmarcher
Sepp Hochreiter
Thomas Adler
CLIP
VLM
47
0
0
10 Jul 2023
SVIT: Scaling up Visual Instruction Tuning
Bo-Lu Zhao
Boya Wu
Muyang He
Tiejun Huang
MLLM
36
119
0
09 Jul 2023
On decoder-only architecture for speech-to-text and large language model integration
Jian Wu
Yashesh Gaur
Zhuo Chen
Long Zhou
Yilun Zhu
...
Jinyu Li
Shujie Liu
Bo Ren
Linquan Liu
Yu-Huan Wu
AuLLM
30
117
0
08 Jul 2023
Previous
1
2
3
...
61
62
63
64
65
Next