Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.08485
Cited By
Visual Instruction Tuning
17 April 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Instruction Tuning"
50 / 3,278 papers shown
Title
MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning
Hang Zhao
Yifei Xin
Zhesong Yu
Bilei Zhu
Lu Lu
Zejun Ma
AuLLM
31
4
0
12 Feb 2024
An Empirical Study Into What Matters for Calibrating Vision-Language Models
Weijie Tu
Weijian Deng
Dylan Campbell
Stephen Gould
Tom Gedeon
VLM
35
7
0
12 Feb 2024
Exploring Perceptual Limitation of Multimodal Large Language Models
Jiarui Zhang
Jinyi Hu
Mahyar Khayatkhoei
Filip Ilievski
Maosong Sun
LRM
29
10
0
12 Feb 2024
Towards Explainable, Safe Autonomous Driving with Language Embeddings for Novelty Identification and Active Learning: Framework and Experimental Analysis with Real-World Data Sets
Ross Greer
Mohan M. Trivedi
45
19
0
11 Feb 2024
Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy
Simon Ging
M. A. Bravo
Thomas Brox
VLM
51
11
0
11 Feb 2024
A Benchmark for Multi-modal Foundation Models on Low-level Vision: from Single Images to Pairs
Zicheng Zhang
Haoning Wu
Erli Zhang
Guangtao Zhai
Weisi Lin
VLM
29
8
0
11 Feb 2024
Reasoning Grasping via Multimodal Large Language Model
Shiyu Jin
Jinxuan Xu
Yutian Lei
Liangjun Zhang
LRM
39
20
0
09 Feb 2024
Is it safe to cross? Interpretable Risk Assessment with GPT-4V for Safety-Aware Street Crossing
Hochul Hwang
Sunjae Kwon
Yekyung Kim
Donghyun Kim
32
11
0
09 Feb 2024
Large Language Models for Captioning and Retrieving Remote Sensing Images
João Daniel Silva
João Magalhães
D. Tuia
Bruno Martins
46
29
0
09 Feb 2024
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education
Unggi Lee
Minji Jeon
Yunseo Lee
Gyuri Byun
Yoorim Son
Jaeyoon Shin
Hongkyu Ko
Hyeoncheol Kim
22
8
0
09 Feb 2024
ScreenAgent: A Vision Language Model-driven Computer Control Agent
Runliang Niu
Jindong Li
Shiqi Wang
Yali Fu
Xiyu Hu
Xueyuan Leng
He Kong
Yi Chang
Qi Wang
LLMAG
MLLM
LM&Ro
66
39
0
09 Feb 2024
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
Siming Yan
Min Bai
Weifeng Chen
Xiong Zhou
Qixing Huang
Erran L. Li
VLM
25
19
0
09 Feb 2024
LLMs for Coding and Robotics Education
Peng Shu
Huaqin Zhao
Hanqi Jiang
Yiwei Li
Shaochen Xu
...
Zheng Liu
Guoyu Lu
Le Guan
Gong Chen
Xianqiao Wang Tianming Liu
47
5
0
09 Feb 2024
Large Language Models: A Survey
Shervin Minaee
Tomas Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
134
377
0
09 Feb 2024
Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy
Seyedarmin Azizi
M. Nazemi
Massoud Pedram
ViT
MQ
48
2
0
08 Feb 2024
Question Aware Vision Transformer for Multimodal Reasoning
Roy Ganz
Yair Kittenplon
Aviad Aberdam
Elad Ben Avraham
Oren Nuriel
Shai Mazor
Ron Litman
52
20
0
08 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
130
110
0
08 Feb 2024
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu
Jaehong Yoon
Mohit Bansal
85
4
0
08 Feb 2024
Source-Free Domain Adaptation with Diffusion-Guided Source Data Generation
Shivang Chopra
Suraj Kothawade
Houda Aynaou
Aman Chadha
DiffM
42
1
0
07 Feb 2024
PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Jinghui Lu
Ziwei Yang
Yanjie Wang
Xuejing Liu
Brian Mac Namee
Can Huang
MoE
55
5
0
07 Feb 2024
Dual-View Visual Contextualization for Web Navigation
Jihyung Kil
Chan Hee Song
Boyuan Zheng
Xiang Deng
Yu-Chuan Su
Wei-Lun Chao
EgoV
22
12
0
06 Feb 2024
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Quan-Sen Sun
Jinsheng Wang
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Xinlong Wang
VLM
CLIP
MLLM
100
42
0
06 Feb 2024
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Mantas Mazeika
Long Phan
Xuwang Yin
Andy Zou
Zifan Wang
...
Nathaniel Li
Steven Basart
Bo Li
David A. Forsyth
Dan Hendrycks
AAML
35
335
0
06 Feb 2024
The Essential Role of Causality in Foundation World Models for Embodied AI
Tarun Gupta
Wenbo Gong
Chao Ma
Nick Pawlowski
Agrin Hilmkil
...
Jianfeng Gao
Stefan Bauer
Danica Kragic
Bernhard Schölkopf
Cheng Zhang
43
15
0
06 Feb 2024
Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction
Yonggang Jin
Ge Zhang
Hao Zhao
Tianyu Zheng
Jiawei Guo
Liuyu Xiang
Shawn Yue
Stephen W. Huang
Zhaofeng He
Jie Fu
OffRL
44
4
0
06 Feb 2024
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Xiangxiang Chu
Limeng Qiao
Xinyu Zhang
Shuang Xu
Fei Wei
...
Xiaofei Sun
Yiming Hu
Xinyang Lin
Bo Zhang
Chunhua Shen
VLM
MLLM
33
100
0
06 Feb 2024
Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue
Kun Ouyang
Liqiang Jing
Xuemeng Song
Meng Liu
Yupeng Hu
Liqiang Nie
99
3
0
06 Feb 2024
RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents
Tomoyuki Kagaya
Thong Jing Yuan
Yuxuan Lou
J. Karlekar
Sugiri Pranata
Akira Kinose
Koki Oguri
Felix Wick
Yang You
LLMAG
54
33
0
06 Feb 2024
V-IRL: Grounding Virtual Intelligence in Real Life
Jihan Yang
Runyu Ding
Ellis L Brown
Xiaojuan Qi
Saining Xie
LM&Ro
59
19
0
05 Feb 2024
Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models
Yuancheng Xu
Jiarui Yao
Manli Shu
Yanchao Sun
Zichu Wu
Ning Yu
Tom Goldstein
Furong Huang
AAML
43
17
0
05 Feb 2024
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
Yang Jin
Zhicheng Sun
Kun Xu
Kun Xu
Liwei Chen
...
Yuliang Liu
Di Zhang
Yang Song
Kun Gai
Yadong Mu
VGen
55
42
0
05 Feb 2024
Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing
Yan Shu
Weichao Zeng
Zhenhang Li
Fangmin Zhao
Yu Zhou
37
3
0
05 Feb 2024
Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives
Sheng Luo
Wei Chen
Wanxin Tian
Rui Liu
Luanxuan Hou
...
Ling Shao
Yi Yang
Bojun Gao
Qun Li
Guobin Wu
58
13
0
05 Feb 2024
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
43
30
0
05 Feb 2024
Time-, Memory- and Parameter-Efficient Visual Adaptation
Otniel-Bogdan Mercea
Alexey Gritsenko
Cordelia Schmid
Anurag Arnab
VLM
40
13
0
05 Feb 2024
Position: What Can Large Language Models Tell Us about Time Series Analysis
Ming Jin
Yifan Zhang
Wei Chen
Kexin Zhang
Keli Zhang
Bin Yang
Jindong Wang
Shirui Pan
Qingsong Wen
AI4TS
39
16
0
05 Feb 2024
Image-Caption Encoding for Improving Zero-Shot Generalization
Eric Yang Yu
Christopher Liao
Sathvik Ravi
Theodoros Tsiligkaridis
Brian Kulis
OODD
VLM
27
0
0
05 Feb 2024
Generalizable Entity Grounding via Assistance of Large Language Model
Lu Qi
Yi-Wen Chen
Lehan Yang
Tiancheng Shen
Xiangtai Li
Weidong Guo
Yu-Syuan Xu
Ming-Hsuan Yang
VLM
69
9
0
04 Feb 2024
LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
Dilxat Muhtar
Zhenshi Li
Feng-Xue Gu
Xue-liang Zhang
Pengfeng Xiao
82
53
0
04 Feb 2024
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering
Ziyu Ma
Shutao Li
Bin Sun
Jianfei Cai
Zuxiang Long
Fuyan Ma
34
2
0
04 Feb 2024
A Survey on Robotics with Foundation Models: toward Embodied AI
Zhiyuan Xu
Kun Wu
Junjie Wen
Jinming Li
Ning Liu
Zhengping Che
Jian Tang
AI4CE
LRM
LM&Ro
33
24
0
04 Feb 2024
AutoTimes: Autoregressive Time Series Forecasters via Large Language Models
Yong Liu
Guo Qin
Xiangdong Huang
Jianmin Wang
Mingsheng Long
AI4TS
37
22
0
04 Feb 2024
Jailbreaking Attack against Multimodal Large Language Model
Zhenxing Niu
Haoxuan Ji
Xinbo Gao
Gang Hua
Rong Jin
50
61
0
04 Feb 2024
Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models
Yongshuo Zong
Ondrej Bohdal
Tingyang Yu
Yongxin Yang
Timothy M. Hospedales
VLM
MLLM
57
60
0
03 Feb 2024
GPT-4V as Traffic Assistant: An In-depth Look at Vision Language Model on Complex Traffic Events
Xingcheng Zhou
Alois Knoll
21
9
0
03 Feb 2024
GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning
Yanbin Wei
Shuai Fu
Weisen Jiang
Zejian Zhang
Zhixiong Zeng
Qi Wu
James T. Kwok
Yu Zhang
35
12
0
03 Feb 2024
On Catastrophic Inheritance of Large Foundation Models
Hao Chen
Bhiksha Raj
Xing Xie
Jindong Wang
AI4CE
61
12
0
02 Feb 2024
Explaining latent representations of generative models with large multimodal models
Mengdan Zhu
Zhenke Liu
Bo Pan
Abhinav Angirekula
Liang Zhao
42
2
0
02 Feb 2024
SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training?
Hasan Hammoud
Hani Itani
Fabio Pizzati
Philip Torr
Adel Bibi
Guohao Li
CLIP
VLM
122
37
0
02 Feb 2024
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Zhifeng Kong
Arushi Goel
Rohan Badlani
Ming-Yu Liu
Rafael Valle
Bryan Catanzaro
AuLLM
LM&MA
MLLM
78
75
0
02 Feb 2024
Previous
1
2
3
...
51
52
53
...
64
65
66
Next