Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.08485
Cited By
Visual Instruction Tuning
17 April 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Instruction Tuning"
50 / 2,162 papers shown
Title
Vript: A Video Is Worth Thousands of Words
Dongjie Yang
Suyuan Huang
Chengqiang Lu
Xiaodong Han
Haoxin Zhang
Yan Gao
Yao Hu
Hai Zhao
VGen
66
22
0
10 Jun 2024
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Peng Xia
Ze Chen
Juanxi Tian
Yangrui Gong
Ruibo Hou
...
Jimeng Sun
Zongyuan Ge
Gang Li
James Zou
Huaxiu Yao
MU
VLM
61
31
0
10 Jun 2024
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
David Romero
Chenyang Lyu
Haryo Akbarianto Wibowo
Teresa Lynn
Injy Hamed
...
Oana Ignat
Joan Nwatu
Rada Mihalcea
Thamar Solorio
Alham Fikri Aji
43
24
0
10 Jun 2024
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions
Daizong Liu
Yang Liu
Wencan Huang
Wei Hu
LM&Ro
29
9
0
09 Jun 2024
MLLM-SR: Conversational Symbolic Regression base Multi-Modal Large Language Models
Yanjie Li
Weijun Li
Lina Yu
Min Wu
Jingyi Liu
Wenqiang Li
Shu Wei
Yusong Deng
OffRL
29
3
0
08 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAML
VLM
38
13
0
08 Jun 2024
USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
Xiaoqi Wang
Wenbin He
Xiwei Xuan
Clint Sebastian
Jorge Henrique Piazentin Ono
...
Sima Behpour
T. Doan
Liang Gou
Han-Wei Shen
Liu Ren
VLM
27
5
0
07 Jun 2024
Large Generative Graph Models
Yu Wang
Ryan A. Rossi
Namyong Park
Huiyuan Chen
Nesreen K. Ahmed
Puja Trivedi
Franck Dernoncourt
Danai Koutra
Tyler Derr
AI4CE
31
3
0
07 Jun 2024
Composition Vision-Language Understanding via Segment and Depth Anything Model
Mingxiao Huo
Pengliang Ji
Haotian Lin
Junchen Liu
Yixiao Wang
Yijun Chen
VLM
26
1
0
07 Jun 2024
Semantic Segmentation on VSPW Dataset through Masked Video Consistency
Chen Liang
Qiang Guo
Chongkai Yu
Chengjing Wu
Ting Liu
Luoqi Liu
37
1
0
07 Jun 2024
InstructNav: Zero-shot System for Generic Instruction Navigation in Unexplored Environment
Yuxing Long
Wenzhe Cai
Hongcheng Wang
Guanqi Zhan
Hao Dong
30
21
0
07 Jun 2024
OVMR: Open-Vocabulary Recognition with Multi-Modal References
Zehong Ma
Shiliang Zhang
Longhui Wei
Qi Tian
VLM
36
0
0
07 Jun 2024
LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model
Dongkai Wang
Shiyu Xuan
Shiliang Zhang
LRM
37
5
0
07 Jun 2024
LinkGPT: Teaching Large Language Models To Predict Missing Links
Zhongmou He
Jing Zhu
Shengyi Qian
Joyce Chai
Danai Koutra
LRM
34
1
0
07 Jun 2024
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Jianing Yang
Xuweiyi Chen
Nikhil Madaan
Madhavan Iyengar
Shengyi Qian
David Fouhey
Joyce Chai
3DV
68
11
0
07 Jun 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
59
31
0
07 Jun 2024
MAIRA-2: Grounded Radiology Report Generation
Shruthi Bannur
Kenza Bouzid
Daniel Coelho De Castro
Anton Schwaighofer
Sam Bond-Taylor
...
Anja Thieme
M. Lungren
Maria T. A. Wetscherek
Javier Alvarez-Valle
Stephanie L. Hyland
40
33
0
06 Jun 2024
Learning 1D Causal Visual Representation with De-focus Attention Networks
Chenxin Tao
Xizhou Zhu
Shiqian Su
Lewei Lu
Changyao Tian
...
Gao Huang
Hongsheng Li
Yu Qiao
Jie Zhou
Jifeng Dai
65
1
0
06 Jun 2024
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
Lingchen Meng
Jianwei Yang
Rui Tian
Xiyang Dai
Zuxuan Wu
Jianfeng Gao
Yu-Gang Jiang
VLM
22
8
0
06 Jun 2024
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Lin Chen
Xilin Wei
Jinsong Li
Xiaoyi Dong
Pan Zhang
...
Li Yuan
Yu Qiao
Dahua Lin
Feng Zhao
Jiaqi Wang
72
142
0
06 Jun 2024
DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data
Qihao Liu
Yi Zhang
Song Bai
Adam Kortylewski
Alan Yuille
34
9
0
06 Jun 2024
What is Dataset Distillation Learning?
William Yang
Ye Zhu
Zhiwei Deng
Olga Russakovsky
DD
36
3
0
06 Jun 2024
Understanding Information Storage and Transfer in Multi-modal Large Language Models
Samyadeep Basu
Martin Grayson
C. Morrison
Besmira Nushi
S. Feizi
Daniela Massiceti
20
10
0
06 Jun 2024
Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt
Zonghao Ying
Aishan Liu
Tianyuan Zhang
Zhengmin Yu
Siyuan Liang
Xianglong Liu
Dacheng Tao
AAML
33
26
0
06 Jun 2024
POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models
Jianben He
Xingbo Wang
Shiyi Liu
Guande Wu
Claudio Silva
Huamin Qu
LRM
29
1
0
06 Jun 2024
Low-Rank Similarity Mining for Multimodal Dataset Distillation
Yue Xu
Zhilin Lin
Yusong Qiu
Cewu Lu
Yong-Lu Li
DD
41
4
0
06 Jun 2024
JIGMARK: A Black-Box Approach for Enhancing Image Watermarks against Diffusion Model Edits
Minzhou Pan
Yi Zeng
Xue Lin
Ning Yu
Cho-Jui Hsieh
Peter Henderson
Ruoxi Jia
WIGM
36
3
0
06 Jun 2024
A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions
Lei Liu
Xiaoyan Yang
Junchi Lei
Xiaoyang Liu
Yue Shen
...
Peng Wei
Jinjie Gu
Zhixuan Chu
Zhan Qin
Kui Ren
LM&MA
AILaw
34
14
0
06 Jun 2024
Wings: Learning Multimodal LLMs without Text-only Forgetting
Yi-Kai Zhang
Shiyin Lu
Yang Li
Yanqing Ma
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
VLM
33
6
0
05 Jun 2024
VideoPhy: Evaluating Physical Commonsense for Video Generation
Hritik Bansal
Zongyu Lin
Tianyi Xie
Zeshun Zong
Michal Yarom
Yonatan Bitton
Chenfanfu Jiang
Yizhou Sun
Kai-Wei Chang
Aditya Grover
EGVM
VGen
32
36
0
05 Jun 2024
AD-H: Autonomous Driving with Hierarchical Agents
Zaibin Zhang
Shiyu Tang
Yuanhang Zhang
Talas Fu
Yifan Wang
Yang Liu
Dong Wang
Jing Shao
Lijun Wang
H. Lu
42
3
0
05 Jun 2024
Training of Physical Neural Networks
Ali Momeni
Babak Rahmani
B. Scellier
Logan G. Wright
Peter L. McMahon
...
Julie Grollier
Andrea J. Liu
D. Psaltis
Andrea Alù
Romain Fleury
PINN
AI4CE
47
9
0
05 Jun 2024
Balancing Performance and Efficiency in Zero-shot Robotic Navigation
Dmytro Kuzmenko
N. Shvai
LM&Ro
20
0
0
05 Jun 2024
DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences
Yidong Huang
Jacob Sansom
Ziqiao Ma
Felix Gervits
Joyce Chai
36
17
0
05 Jun 2024
LADI v2: Multi-label Dataset and Classifiers for Low-Altitude Disaster Imagery
Samuel Scheele
Katherine Picchione
Jeffrey Liu
32
0
0
04 Jun 2024
Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following
Qiaomu Miao
Alexandros Graikos
Jingwei Zhang
Sounak Mondal
Minh Hoai
Dimitris Samaras
30
0
0
04 Jun 2024
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing
V. Trinh
Rosy Southwell
Yiwen Guan
Xinlu He
Zhiyong Wang
Jacob Whitehill
OffRL
36
2
0
04 Jun 2024
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
Alex Jinpeng Wang
Linjie Li
Yiqi Lin
Min Li
Lijuan Wang
Mike Zheng Shou
VLM
20
3
0
04 Jun 2024
Parrot: Multilingual Visual Instruction Tuning
Hai-Long Sun
Da-Wei Zhou
Y. Li
Shiyin Lu
Chao Yi
...
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
MLLM
25
9
0
04 Jun 2024
CoNav: A Benchmark for Human-Centered Collaborative Navigation
Changhao Li
Xinyu Sun
Peihao Chen
Jugang Fan
Zixu Wang
Yanxia Liu
Jinhui Zhu
Chuang Gan
Mingkui Tan
46
1
0
04 Jun 2024
CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models
Junho Kim
Hyunjun Kim
Yeonju Kim
Yong Man Ro
MLLM
37
10
0
04 Jun 2024
GOMAA-Geo: GOal Modality Agnostic Active Geo-localization
Anindya Sarkar
S. Sastry
Aleksis Pirinen
Chongjie Zhang
Nathan Jacobs
Yevgeniy Vorobeychik
44
4
0
04 Jun 2024
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
Ling Li
Yu Ye
Bingchuan Jiang
Wei Zeng
VLM
LRM
31
7
0
03 Jun 2024
PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning
Yupeng Zheng
Zebin Xing
Qichao Zhang
Bu Jin
Pengfei Li
...
Zhongpu Xia
Kun Zhan
Xianpeng Lang
Yaran Chen
Dongbin Zhao
LM&Ro
LRM
LLMAG
55
14
0
03 Jun 2024
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
An-Chieh Cheng
Hongxu Yin
Yang Fu
Qiushan Guo
Ruihan Yang
Jan Kautz
Xiaolong Wang
Sifei Liu
LRM
48
44
0
03 Jun 2024
SLANT: Spurious Logo ANalysis Toolkit
Maan Qraitem
Piotr Teterwak
Kate Saenko
Bryan A. Plummer
AAML
35
0
0
03 Jun 2024
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
Weichao Zhao
Hao Feng
Qi Liu
Jingqun Tang
Shubo Wei
...
Lei Liao
Yongjie Ye
Hao Liu
Houqiang Li
Can Huang
LMTD
26
17
0
03 Jun 2024
It's a Feature, Not a Bug: Measuring Creative Fluidity in Image Generators
Aditi Ramaswamy
Melane Navaratnarajah
Hana Chockler
EGVM
34
0
0
03 Jun 2024
UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment
Hantao Zhou
Longxiang Tang
Rui Yang
Guanyi Qin
Yan Zhang
Runze Hu
Xiu Li
34
5
0
03 Jun 2024
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
Junyang Wang
Haiyang Xu
Haitao Jia
Xi Zhang
Ming Yan
Weizhou Shen
Ji Zhang
Fei Huang
Jitao Sang
LM&Ro
LLMAG
29
45
0
03 Jun 2024
Previous
1
2
3
...
37
38
39
...
42
43
44
Next