ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.08485
  4. Cited By
Visual Instruction Tuning

Visual Instruction Tuning

17 April 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
    SyDa
    VLM
    MLLM
ArXivPDFHTML

Papers citing "Visual Instruction Tuning"

50 / 3,212 papers shown
Title
Stone Needle: A General Multimodal Large-scale Model Framework towards
  Healthcare
Stone Needle: A General Multimodal Large-scale Model Framework towards Healthcare
Weihua Liu
Yong Zuo
LM&MA
11
2
0
28 Jun 2023
Towards Open Vocabulary Learning: A Survey
Towards Open Vocabulary Learning: A Survey
Jianzong Wu
Xiangtai Li
Shilin Xu
Haobo Yuan
Henghui Ding
...
Jiangning Zhang
Yu Tong
Xudong Jiang
Bernard Ghanem
Dacheng Tao
ObjD
VLM
32
136
0
28 Jun 2023
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Ke Chen
Zhao Zhang
Weili Zeng
Richong Zhang
Feng Zhu
Rui Zhao
ObjD
42
596
0
27 Jun 2023
Large Multimodal Models: Notes on CVPR 2023 Tutorial
Large Multimodal Models: Notes on CVPR 2023 Tutorial
Chunyuan Li
MLLM
VLM
14
20
0
26 Jun 2023
Are aligned neural networks adversarially aligned?
Are aligned neural networks adversarially aligned?
Nicholas Carlini
Milad Nasr
Christopher A. Choquette-Choo
Matthew Jagielski
Irena Gao
...
Pang Wei Koh
Daphne Ippolito
Katherine Lee
Florian Tramèr
Ludwig Schmidt
AAML
22
222
0
26 Jun 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2: Grounding Multimodal Large Language Models to the World
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLM
ObjD
VLM
38
694
0
26 Jun 2023
Mitigating Hallucination in Large Multi-Modal Models via Robust
  Instruction Tuning
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Fuxiao Liu
Kevin Qinghong Lin
Linjie Li
Jianfeng Wang
Yaser Yacoob
Lijuan Wang
VLM
MLLM
22
239
0
26 Jun 2023
DesCo: Learning Object Recognition with Rich Language Descriptions
DesCo: Learning Object Recognition with Rich Language Descriptions
Liunian Harold Li
Zi-Yi Dou
Nanyun Peng
Kai-Wei Chang
ObjD
VLM
24
20
0
24 Jun 2023
A Survey on Multimodal Large Language Models
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Bill Xu
Enhong Chen
MLLM
LRM
48
553
0
23 Jun 2023
Learning Descriptive Image Captioning via Semipermeable Maximum
  Likelihood Estimation
Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation
Zihao Yue
Anwen Hu
Liang Zhang
Qin Jin
24
2
0
23 Jun 2023
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language
  Models
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu
Peixian Chen
Yunhang Shen
Yulei Qin
Mengdan Zhang
...
Xiawu Zheng
Ke Li
Xing Sun
Zhenyu Qiu
Rongrong Ji
ELM
MLLM
42
759
0
23 Jun 2023
Visual Adversarial Examples Jailbreak Aligned Large Language Models
Visual Adversarial Examples Jailbreak Aligned Large Language Models
Xiangyu Qi
Kaixuan Huang
Ashwinee Panda
Peter Henderson
Mengdi Wang
Prateek Mittal
AAML
23
137
0
22 Jun 2023
Dense Video Object Captioning from Disjoint Supervision
Dense Video Object Captioning from Disjoint Supervision
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
25
3
0
20 Jun 2023
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large
  Vision-Language Model for Remote Sensing
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
Zilun Zhang
Tiancheng Zhao
Yulong Guo
Jianwei Yin
DiffM
VLM
27
52
0
20 Jun 2023
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
F. Liu
Delong Chen
Zhan-Rong Guan
Xiaocong Zhou
Jiale Zhu
Qiaolin Ye
Liyong Fu
Jun Zhou
VLM
68
191
0
19 Jun 2023
Path to Medical AGI: Unify Domain-specific Medical LLMs with the Lowest
  Cost
Path to Medical AGI: Unify Domain-specific Medical LLMs with the Lowest Cost
Juexiao Zhou
Xiuying Chen
Xin Gao
LM&MA
AI4CE
85
12
0
19 Jun 2023
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
Rabiul Awal
Le Zhang
Aishwarya Agrawal
LRM
38
12
0
16 Jun 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large
  Vision-Language Models
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
Peng-Tao Xu
Wenqi Shao
Kaipeng Zhang
Peng Gao
Shuo Liu
Meng Lei
Fanqing Meng
Siyuan Huang
Yu Qiao
Ping Luo
ELM
MLLM
28
159
0
15 Jun 2023
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and
  Text Integration
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration
Chenyang Lyu
Minghao Wu
Longyue Wang
Xinting Huang
Bingshuai Liu
Zefeng Du
Shuming Shi
Zhaopeng Tu
MLLM
AuLLM
29
160
0
15 Jun 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large
  Language Models
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
33
7
0
14 Jun 2023
AVIS: Autonomous Visual Information Seeking with Large Language Model
  Agent
AVIS: Autonomous Visual Information Seeking with Large Language Model Agent
Ziniu Hu
Ahmet Iscen
Chen Sun
Kai-Wei Chang
Yizhou Sun
David A. Ross
Cordelia Schmid
Alireza Fathi
31
11
0
13 Jun 2023
Sticker820K: Empowering Interactive Retrieval with Stickers
Sticker820K: Empowering Interactive Retrieval with Stickers
Sijie Zhao
Yixiao Ge
Zhongang Qi
Lin Song
Xiaohan Ding
Zehua Xie
Ying Shan
20
6
0
12 Jun 2023
Valley: Video Assistant with Large Language model Enhanced abilitY
Valley: Video Assistant with Large Language model Enhanced abilitY
Ruipu Luo
Ziwang Zhao
Min Yang
Junwei Dong
Da Li
Pengcheng Lu
Tao Wang
Linmei Hu
Ming-Hui Qiu
MLLM
40
189
0
12 Jun 2023
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset,
  Framework, and Benchmark
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
Zhen-fei Yin
Jiong Wang
Jianjian Cao
Zhelun Shi
Dingning Liu
...
Lei Bai
Xiaoshui Huang
Zhiyong Wang
Jing Shao
Wanli Ouyang
MLLM
22
152
0
11 Jun 2023
Leveraging Large Language Models for Scalable Vector Graphics-Driven
  Image Understanding
Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding
Mu Cai
Zeyi Huang
Yuheng Li
Utkarsh Ojha
Haohan Wang
Yong Jae Lee
VLM
16
2
0
09 Jun 2023
On the Challenges and Perspectives of Foundation Models for Medical
  Image Analysis
On the Challenges and Perspectives of Foundation Models for Medical Image Analysis
Shaoting Zhang
Dimitris N. Metaxas
LM&MA
VLM
MedIm
AI4CE
31
128
0
09 Jun 2023
Artificial General Intelligence for Medical Imaging
Artificial General Intelligence for Medical Imaging
Xiang Li
Lu Zhang
Zihao Wu
Zheng Liu
Lin Zhao
...
Pingkuan Yan
Quanzheng Li
W. Liu
Tianming Liu
Dinggang Shen
LM&MA
AI4CE
19
40
0
08 Jun 2023
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
Bo-wen Li
Yuanhan Zhang
Liangyu Chen
Jinghao Wang
Fanyi Pu
Jingkang Yang
C. Li
Ziwei Liu
MLLM
VLM
31
224
0
08 Jun 2023
M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual
  Instruction Tuning
M3^33IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning
Lei Li
Yuwei Yin
Shicheng Li
Liang Chen
Peiyi Wang
...
Yazheng Yang
Jingjing Xu
Xu Sun
Lingpeng Kong
Qi Liu
MLLM
VLM
27
115
0
07 Jun 2023
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for
  Pre-training and Benchmarks
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Haiyang Xu
Qinghao Ye
Xuan-Wei Wu
Mingshi Yan
Yuan Miao
...
Qingfang Qian
Maofei Que
Ji Zhang
Xiaoyan Zeng
Feiyan Huang
VLM
MLLM
41
23
0
07 Jun 2023
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video
  Understanding
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Hang Zhang
Xin Li
Lidong Bing
MLLM
53
948
0
05 Jun 2023
Biologically-Motivated Learning Model for Instructed Visual Processing
Biologically-Motivated Learning Model for Instructed Visual Processing
R. Abel
S. Ullman
20
0
0
04 Jun 2023
Revisiting the Role of Language Priors in Vision-Language Models
Revisiting the Role of Language Priors in Vision-Language Models
Zhiqiu Lin
Xinyue Chen
Deepak Pathak
Pengchuan Zhang
Deva Ramanan
VLM
23
22
0
02 Jun 2023
MetaVL: Transferring In-Context Learning Ability From Language Models to
  Vision-Language Models
MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models
Masoud Monajatipoor
Liunian Harold Li
Mozhdeh Rouhsedaghat
Lin F. Yang
Kai-Wei Chang
MLLM
LRM
19
12
0
02 Jun 2023
LLaVA-Med: Training a Large Language-and-Vision Assistant for
  Biomedicine in One Day
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
Chunyuan Li
Cliff Wong
Sheng Zhang
Naoto Usuyama
Haotian Liu
Jianwei Yang
Tristan Naumann
Hoifung Poon
Jianfeng Gao
LM&MA
MedIm
51
697
0
01 Jun 2023
Versatile Backdoor Attack with Visible, Semantic, Sample-Specific, and
  Compatible Triggers
Versatile Backdoor Attack with Visible, Semantic, Sample-Specific, and Compatible Triggers
Ruotong Wang
Hongrui Chen
Zihao Zhu
Li Liu
Baoyuan Wu
DiffM
30
11
0
01 Jun 2023
Adapting Pre-trained Language Models to Vision-Language Tasks via
  Dynamic Visual Prompting
Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting
Shubin Huang
Qiong Wu
Yiyi Zhou
Weijie Chen
Rongsheng Zhang
Xiaoshuai Sun
Rongrong Ji
VLM
VPVLM
LRM
16
0
0
01 Jun 2023
GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task
GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task
Ning Ding
Yehui Tang
Zhongqian Fu
Chaoting Xu
Kai Han
Yunhe Wang
MLLM
VLM
29
2
0
01 Jun 2023
Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL
  Models
Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models
Sivan Doveh
Assaf Arbelle
Sivan Harary
Roei Herzig
Donghyun Kim
...
Rameswar Panda
Raja Giryes
Rogerio Feris
S. Ullman
Leonid Karlinsky
VLM
CoGe
31
52
0
31 May 2023
Contextual Object Detection with Multimodal Large Language Models
Contextual Object Detection with Multimodal Large Language Models
Yuhang Zang
Wei Li
Jun Han
Kaiyang Zhou
Chen Change Loy
ObjD
VLM
MLLM
27
77
0
29 May 2023
GlyphControl: Glyph Conditional Control for Visual Text Generation
GlyphControl: Glyph Conditional Control for Visual Text Generation
Yukang Yang
Dongnan Gui
Yuhui Yuan
Weicong Liang
Haisong Ding
Hang-Rui Hu
Kai Chen
DiffM
27
77
0
29 May 2023
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating
  Vision-Language Transformers
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
Dachuan Shi
Chaofan Tao
Anyi Rao
Zhendong Yang
Chun Yuan
Jiaqi Wang
VLM
30
22
0
27 May 2023
Generating Images with Multimodal Language Models
Generating Images with Multimodal Language Models
Jing Yu Koh
Daniel Fried
Ruslan Salakhutdinov
MLLM
28
241
0
26 May 2023
On Evaluating Adversarial Robustness of Large Vision-Language Models
On Evaluating Adversarial Robustness of Large Vision-Language Models
Yunqing Zhao
Tianyu Pang
Chao Du
Xiao Yang
Chongxuan Li
Ngai-man Cheung
Min-Bin Lin
VLM
AAML
MLLM
19
166
0
26 May 2023
ChatBridge: Bridging Modalities with Large Language Model as a Language
  Catalyst
ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Zijia Zhao
Longteng Guo
Tongtian Yue
Si-Qing Chen
Shuai Shao
Xinxin Zhu
Zehuan Yuan
Jing Liu
MLLM
32
52
0
25 May 2023
Towards Language-guided Interactive 3D Generation: LLMs as Layout
  Interpreter with Generative Feedback
Towards Language-guided Interactive 3D Generation: LLMs as Layout Interpreter with Generative Feedback
Yiqi Lin
Hao Wu
Ruichen Wang
H. Lu
Xiaodong Lin
Hui Xiong
Lin Wang
3DV
42
12
0
25 May 2023
The False Promise of Imitating Proprietary LLMs
The False Promise of Imitating Proprietary LLMs
Arnav Gudibande
Eric Wallace
Charles Burton Snell
Xinyang Geng
Hao Liu
Pieter Abbeel
Sergey Levine
Dawn Song
ALM
27
196
0
25 May 2023
PandaGPT: One Model To Instruction-Follow Them All
PandaGPT: One Model To Instruction-Follow Them All
Yixuan Su
Tian Lan
Huayang Li
Jialu Xu
Yan Wang
Deng Cai
MLLM
34
274
0
25 May 2023
Rethinking the Evaluation Protocol of Domain Generalization
Rethinking the Evaluation Protocol of Domain Generalization
Han Yu
Xingxuan Zhang
Renzhe Xu
Jiashuo Liu
Yue He
Peng Cui
OOD
24
7
0
24 May 2023
Visually-Situated Natural Language Understanding with Contrastive
  Reading Model and Frozen Large Language Models
Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models
Geewook Kim
Hodong Lee
D. Kim
Haeji Jung
S. Park
Yoon Kim
Sangdoo Yun
Taeho Kil
Bado Lee
Seunghyun Park
VLM
35
4
0
24 May 2023
Previous
123...62636465
Next