Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.08485
Cited By
Visual Instruction Tuning
17 April 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Instruction Tuning"
50 / 3,228 papers shown
Title
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Lianghui Zhu
Xinggang Wang
Xinlong Wang
ELM
ALM
56
108
0
26 Oct 2023
Exploring Question Decomposition for Zero-Shot VQA
Zaid Khan
B. Vijaykumar
S. Schulter
Manmohan Chandraker
Yun Fu
ReLM
17
10
0
25 Oct 2023
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
Aaron Gokaslan
A. Feder Cooper
Jasmine Collins
Landan Seguin
Austin Jacobson
Mihir Patel
Jonathan Frankle
Cory Stephenson
Volodymyr Kuleshov
DiffM
17
16
0
25 Oct 2023
Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation
Yongxin Shi
Dezhi Peng
Wenhui Liao
Zening Lin
Xinhong Chen
Chongyu Liu
Yuyi Zhang
Lianwen Jin
MLLM
30
44
0
25 Oct 2023
DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models
Ge Zheng
Bin Yang
Jiajin Tang
Hong-Yu Zhou
Sibei Yang
LRM
MLLM
29
93
0
25 Oct 2023
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Tong Xu
Hao Wang
Dianbo Sui
Yunhang Shen
Ke Li
Xingguo Sun
Enhong Chen
VLM
MLLM
38
114
0
24 Oct 2023
Integrating View Conditions for Image Synthesis
Jinbin Bai
Zhen Dong
Aosong Feng
Xiao Zhang
Tian-Chun Ye
Kaicheng Zhou
67
13
0
24 Oct 2023
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Tianrui Guan
Fuxiao Liu
Xiyang Wu
Ruiqi Xian
Zongxia Li
...
Lichang Chen
Furong Huang
Yaser Yacoob
Dinesh Manocha
Dinesh Manocha
VLM
MLLM
34
155
0
23 Oct 2023
Meaning Representations from Trajectories in Autoregressive Models
Tian Yu Liu
Matthew Trager
Alessandro Achille
Pramuditha Perera
L. Zancato
Stefano Soatto
26
14
0
23 Oct 2023
Vision Language Models in Autonomous Driving: A Survey and Outlook
Xingcheng Zhou
Mingyu Liu
Ekim Yurtsever
B. L. Žagar
Walter Zimmer
Hu Cao
Alois C. Knoll
VLM
29
36
0
22 Oct 2023
Semantic and Expressive Variation in Image Captions Across Languages
Andre Ye
Sebastin Santy
Jena D. Hwang
Amy X. Zhang
Ranjay Krishna
VLM
56
3
0
22 Oct 2023
MarineGPT: Unlocking Secrets of Ocean to the Public
Ziqiang Zheng
Jipeng Zhang
Tuan-Anh Vu
Shizhe Diao
Yue Him Wong Tim
Sai-Kit Yeung
35
11
0
20 Oct 2023
Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Mingwei Zhu
Leigang Sha
Yu Shu
Kangjia Zhao
Tiancheng Zhao
Jianwei Yin
LRM
27
0
0
20 Oct 2023
Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds
Sipeng Zheng
Jiazheng Liu
Yicheng Feng
Zongqing Lu
42
29
0
20 Oct 2023
Eureka: Human-Level Reward Design via Coding Large Language Models
Yecheng Jason Ma
William Liang
Guanzhi Wang
De-An Huang
Osbert Bastani
Dinesh Jayaraman
Yuke Zhu
Linxi Fan
A. Anandkumar
21
291
0
19 Oct 2023
Audio Editing with Non-Rigid Text Prompts
Francesco Paissan
Luca Della Libera
Zhepei Wang
Mirco Ravanelli
Paris Smaragdis
Cem Subakan
DiffM
46
5
0
19 Oct 2023
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions
Hanbo Zhang
Jie Xu
Yuchen Mo
Tao Kong
17
1
0
18 Oct 2023
Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning for Versatile Multimodal Modeling
Yaqing Wang
Jialin Wu
T. Dabral
Jiageng Zhang
Geoff Brown
...
Frederick Liu
Yi Liang
Bo Pang
Michael Bendersky
Radu Soricut
VLM
23
14
0
18 Oct 2023
LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop Manipulation
Shengqiang Zhang
Philipp Wicke
Lutfi Kerem Senel
Luis F. C. Figueredo
Abdeldjallil Naceri
Sami Haddadin
Barbara Plank
Hinrich Schütze
LM&Ro
29
10
0
18 Oct 2023
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Jianwei Yang
Hao Zhang
Feng Li
Xueyan Zou
Chun-yue Li
Jianfeng Gao
MLLM
VLM
30
158
0
17 Oct 2023
Towards Training-free Open-world Segmentation via Image Prompt Foundation Models
Lv Tang
Peng-Tao Jiang
Haoke Xiao
Bo Li
VLM
13
7
0
17 Oct 2023
A Survey on Video Diffusion Models
Zhen Xing
Qijun Feng
Haoran Chen
Qi Dai
Hang-Rui Hu
Hang Xu
Zuxuan Wu
Yu-Gang Jiang
EGVM
VGen
57
116
0
16 Oct 2023
Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms
Seungju Han
Junhyeok Kim
Jack Hessel
Liwei Jiang
Jiwan Chung
Yejin Son
Yejin Choi
Youngjae Yu
13
2
0
16 Oct 2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung
Youngjae Yu
VLM
27
1
0
15 Oct 2023
JM3D & JM3D-LLM: Elevating 3D Understanding with Joint Multi-modal Cues
Jiayi Ji
Haowei Wang
Changli Wu
Yiwei Ma
Xiaoshuai Sun
Rongrong Ji
46
1
0
14 Oct 2023
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
160
441
0
14 Oct 2023
MM-BigBench: Evaluating Multimodal Models on Multimodal Content Comprehension Tasks
Xiaocui Yang
Wenfang Wu
Shi Feng
Ming Wang
Daling Wang
Yang Li
Qi Sun
Yifei Zhang
Xiaoming Fu
Soujanya Poria
LRM
ELM
25
10
0
13 Oct 2023
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
Dongsheng Jiang
Yuchen Liu
Songlin Liu
Jiné Zhao
Hao Zhang
Zhen Gao
Xiaopeng Zhang
Jin Li
Hongkai Xiong
MLLM
VLM
33
34
0
13 Oct 2023
Multimodal Large Language Model for Visual Navigation
Yao-Hung Tsai
Vansh Dhar
Jialu Li
Bowen Zhang
Jian Zhang
VLM
LM&Ro
22
9
0
12 Oct 2023
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Jingkang Yang
Yuhao Dong
Shuai Liu
Bo-wen Li
Ziyue Wang
...
Haoran Tan
Jiamu Kang
Yuanhan Zhang
Kaiyang Zhou
Ziwei Liu
LM&Ro
46
45
0
12 Oct 2023
Visual Data-Type Understanding does not emerge from Scaling Vision-Language Models
Vishaal Udandarao
Max F. Burg
Samuel Albanie
Matthias Bethge
VLM
34
9
0
12 Oct 2023
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Kevin Qinghong Lin
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
LRM
MLLM
DiffM
13
22
0
12 Oct 2023
Can We Edit Multimodal Large Language Models?
Siyuan Cheng
Bo Tian
Qingbin Liu
Xi Chen
Yongheng Wang
Huajun Chen
Ningyu Zhang
MLLM
28
28
0
12 Oct 2023
STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment
Jaewoo Lee
Jaehong Yoon
Wonjae Kim
Yunji Kim
Sung Ju Hwang
CLL
14
1
0
12 Oct 2023
Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Junyu Lu
Di Zhang
Xiaojun Wu
Xinyu Gao
Ruyi Gan
Jiaxing Zhang
Yan Song
Pingjian Zhang
VLM
MLLM
17
7
0
12 Oct 2023
AutoRepo: A general framework for multi-modal LLM-based automated construction reporting
Hongxu Pu
Xincong Yang
Jing Li
Runhao Guo
Heng Li
17
6
0
11 Oct 2023
Towards the Fundamental Limits of Knowledge Transfer over Finite Domains
Qingyue Zhao
Banghua Zhu
36
4
0
11 Oct 2023
OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation
Jie An
Zhengyuan Yang
Linjie Li
Jianfeng Wang
K. Lin
Zicheng Liu
Lijuan Wang
Jiebo Luo
17
11
0
11 Oct 2023
Ferret: Refer and Ground Anything Anywhere at Any Granularity
Haoxuan You
Haotian Zhang
Zhe Gan
Xianzhi Du
Bowen Zhang
Zirui Wang
Liangliang Cao
Shih-Fu Chang
Yinfei Yang
ObjD
MLLM
VLM
24
301
0
11 Oct 2023
VeCLIP: Improving CLIP Training via Visual-enriched Captions
Zhengfeng Lai
Haotian Zhang
Bowen Zhang
Wentao Wu
Haoping Bai
...
Zhe Gan
Jiulong Shan
Chen-Nee Chuah
Yinfei Yang
Meng Cao
CLIP
VLM
34
28
0
11 Oct 2023
Composite Backdoor Attacks Against Large Language Models
Hai Huang
Zhengyu Zhao
Michael Backes
Yun Shen
Yang Zhang
AAML
27
36
0
11 Oct 2023
KwaiYiiMath: Technical Report
Jia-Yi Fu
Lei Lin
Xiaoyang Gao
Pengli Liu
Zhengzong Chen
...
Zijia Lin
Fuzheng Zhang
Zhongyuan Wang
Di Zhang
Kun Gai
LRM
ReLM
RALM
51
2
0
11 Oct 2023
LLark: A Multimodal Instruction-Following Language Model for Music
Josh Gardner
Simon Durand
Daniel Stoller
Rachel M. Bittner
AuLLM
31
14
0
11 Oct 2023
Making Large Language Models Perform Better in Knowledge Graph Completion
Yichi Zhang
Zhuo Chen
Lingbing Guo
Yajing Xu
Wen Zhang
Hua-zeng Chen
32
41
0
10 Oct 2023
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
Letian Zhang
Xiaotong Zhai
Zhongkai Zhao
Yongshuo Zong
Xin Wen
Bingchen Zhao
LRM
11
0
0
10 Oct 2023
On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets
Ning Liao
Shaofeng Zhang
Renqiu Xia
Min Cao
Yu Qiao
Junchi Yan
MLLM
34
0
0
10 Oct 2023
Improving Compositional Text-to-image Generation with Large Vision-Language Models
Song Wen
Guian Fang
Renrui Zhang
Peng Gao
Hao Dong
Dimitris N. Metaxas
25
17
0
10 Oct 2023
Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models
Archiki Prasad
Elias Stengel-Eskin
Mohit Bansal
ReLM
LRM
28
8
0
09 Oct 2023
Visual Storytelling with Question-Answer Plans
Danyang Liu
Mirella Lapata
Frank Keller
CoGe
11
9
0
08 Oct 2023
Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models
Chenzhuang Du
Yue Zhao
Chonghua Liao
Jiacheng You
Jie Fu
Hang Zhao
39
2
0
08 Oct 2023
Previous
1
2
3
...
58
59
60
...
63
64
65
Next