ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.08485
  4. Cited By
Visual Instruction Tuning

Visual Instruction Tuning

17 April 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
    SyDa
    VLM
    MLLM
ArXivPDFHTML

Papers citing "Visual Instruction Tuning"

50 / 3,228 papers shown
Title
RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection
RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection
Stefanos-Iordanis Papadopoulos
C. Koutlis
Symeon Papadopoulos
P. Petrantonakis
24
9
0
16 Nov 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before
  Projection
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
194
591
0
16 Nov 2023
Trustworthy Large Models in Vision: A Survey
Trustworthy Large Models in Vision: A Survey
Ziyan Guo
Li Xu
Jun Liu
MU
64
0
0
16 Nov 2023
"It's not like Jarvis, but it's pretty close!" -- Examining ChatGPT's
  Usage among Undergraduate Students in Computer Science
"It's not like Jarvis, but it's pretty close!" -- Examining ChatGPT's Usage among Undergraduate Students in Computer Science
Ishika Joshi
Ritvik Budhiraja
Harshal D. Akolekar
Jagat Sesh Challa
Dhruv Kumar
29
27
0
16 Nov 2023
VideoCon: Robust Video-Language Alignment via Contrast Captions
VideoCon: Robust Video-Language Alignment via Contrast Captions
Hritik Bansal
Yonatan Bitton
Idan Szpektor
Kai-Wei Chang
Aditya Grover
40
14
0
15 Nov 2023
GRASP: A novel benchmark for evaluating language GRounding And Situated
  Physics understanding in multimodal language models
GRASP: A novel benchmark for evaluating language GRounding And Situated Physics understanding in multimodal language models
Serwan Jassim
Mario S. Holubar
Annika Richter
Cornelius Wolff
Xenia Ohmer
Elia Bruni
ELM
19
9
0
15 Nov 2023
I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in
  Social Robots
I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in Social Robots
Giulio Antonio Abbo
Tony Belpaeme
19
1
0
15 Nov 2023
Refined Coreset Selection: Towards Minimal Coreset Size under Model
  Performance Constraints
Refined Coreset Selection: Towards Minimal Coreset Size under Model Performance Constraints
Xiaobo Xia
Jiale Liu
Shaokun Zhang
Qingyun Wu
Hongxin Wei
Tongliang Liu
42
9
0
15 Nov 2023
Towards Open-Ended Visual Recognition with Large Language Model
Towards Open-Ended Visual Recognition with Large Language Model
Qihang Yu
Xiaohui Shen
Liang-Chieh Chen
VLM
22
8
0
14 Nov 2023
Vision-Language Instruction Tuning: A Review and Analysis
Vision-Language Instruction Tuning: A Review and Analysis
Chen Li
Yixiao Ge
Dian Li
Ying Shan
VLM
36
12
0
14 Nov 2023
Chat-UniVi: Unified Visual Representation Empowers Large Language Models
  with Image and Video Understanding
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Peng Jin
Ryuichi Takanobu
Caiwan Zhang
Xiaochun Cao
Li-ming Yuan
MLLM
36
223
0
14 Nov 2023
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for
  Multi-modal Large Language Models
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Ziyi Lin
Chris Liu
Renrui Zhang
Peng Gao
Longtian Qiu
...
Siyuan Huang
Yichi Zhang
Xuming He
Hongsheng Li
Yu Qiao
MLLM
VLM
33
210
0
13 Nov 2023
To See is to Believe: Prompting GPT-4V for Better Visual Instruction
  Tuning
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning
Junke Wang
Lingchen Meng
Zejia Weng
Bo He
Zuxuan Wu
Yu-Gang Jiang
MLLM
VLM
29
94
0
13 Nov 2023
A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual
  Question Answering
A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering
Yunxin Li
Longyue Wang
Baotian Hu
Xinyu Chen
Wanqi Zhong
Chenyang Lyu
Wei Wang
Min Zhang
ELM
32
21
0
13 Nov 2023
What Large Language Models Bring to Text-rich VQA?
What Large Language Models Bring to Text-rich VQA?
Xuejing Liu
Wei Tang
Xinzhe Ni
Jinghui Lu
Rui Zhao
Zechao Li
Fei Tan
22
9
0
13 Nov 2023
Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval
Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval
Junyang Chen
Hanjiang Lai
VLM
45
15
0
13 Nov 2023
Detecting and Correcting Hate Speech in Multimodal Memes with Large
  Visual Language Model
Detecting and Correcting Hate Speech in Multimodal Memes with Large Visual Language Model
Minh-Hao Van
Xintao Wu
VLM
MLLM
30
10
0
12 Nov 2023
Monkey: Image Resolution and Text Label Are Important Things for Large
  Multi-modal Models
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Zhang Li
Biao Yang
Qiang Liu
Zhiyin Ma
Shuo Zhang
Jingxu Yang
Yabo Sun
Yuliang Liu
Xiang Bai
MLLM
38
242
0
11 Nov 2023
LayoutPrompter: Awaken the Design Ability of Large Language Models
LayoutPrompter: Awaken the Design Ability of Large Language Models
Jiawei Lin
Jiaqi Guo
Shizhao Sun
Z. Yang
Jian-Guang Lou
Dongmei Zhang
VLM
34
22
0
11 Nov 2023
Online Advertisements with LLMs: Opportunities and Challenges
Online Advertisements with LLMs: Opportunities and Challenges
S. Feizi
Mohammadtaghi Hajiaghayi
Keivan Rezaei
Suho Shin
OffRL
23
10
0
11 Nov 2023
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
Shezheng Song
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
71
4
0
10 Nov 2023
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong
Delong Ran
Jinyuan Liu
Conglei Wang
Tianshuo Cong
Anyu Wang
Sisi Duan
Xiaoyun Wang
MLLM
129
118
0
09 Nov 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
...
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
56
105
0
09 Nov 2023
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
Jinjin Xu
Liwu Xu
Yuzhe Yang
Xiang Li
Fanyi Wang
Yanchun Xie
Yi-Jie Huang
Yaqian Li
MoE
MLLM
VLM
27
12
0
09 Nov 2023
Chain of Images for Intuitively Reasoning
Chain of Images for Intuitively Reasoning
Fanxu Meng
Haotong Yang
Yiding Wang
Muhan Zhang
LRM
36
6
0
09 Nov 2023
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Zhen Yang
Yingxue Zhang
Fandong Meng
Jie Zhou
VLM
MLLM
42
3
0
08 Nov 2023
OtterHD: A High-Resolution Multi-modality Model
OtterHD: A High-Resolution Multi-modality Model
Bo-wen Li
Peiyuan Zhang
Jingkang Yang
Yuanhan Zhang
Fanyi Pu
Ziwei Liu
VLM
MLLM
35
65
0
07 Nov 2023
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with
  Modality Collaboration
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
126
375
0
07 Nov 2023
Scene-Driven Multimodal Knowledge Graph Construction for Embodied AI
Scene-Driven Multimodal Knowledge Graph Construction for Embodied AI
Yaoxian Song
Penglei Sun
Haoyu Liu
Li Zhixu
Wei Song
Yanghua Xiao
Xiaofang Zhou
LM&Ro
53
13
0
07 Nov 2023
GLaMM: Pixel Grounding Large Multimodal Model
GLaMM: Pixel Grounding Large Multimodal Model
H. Rasheed
Muhammad Maaz
Sahal Shaji Mullappilly
Abdelrahman M. Shaker
Salman Khan
Hisham Cholakkal
Rao M. Anwer
Erix Xing
Ming-Hsuan Yang
Fahad S. Khan
MLLM
VLM
41
201
0
06 Nov 2023
CogVLM: Visual Expert for Pretrained Language Models
CogVLM: Visual Expert for Pretrained Language Models
Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
...
Bin Xu
Juanzi Li
Yuxiao Dong
Ming Ding
Jie Tang
VLM
MLLM
27
446
0
06 Nov 2023
Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE
Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE
Zeren Chen
Ziqin Wang
Zhen Wang
Huayang Liu
Zhen-fei Yin
Si Liu
Lu Sheng
Wanli Ouyang
Yu Qiao
Jing Shao
MoE
36
7
0
05 Nov 2023
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
Jing Pan
Jian Wu
Yashesh Gaur
S. Sivasankaran
Zhuo Chen
Shujie Liu
Jinyu Li
ELM
29
26
0
03 Nov 2023
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks
Xinlu Zhang
Yujie Lu
Weizhi Wang
An Yan
Jun Yan
Lianke Qin
Heng Wang
Xifeng Yan
William Yang Wang
Linda R. Petzold
LM&MA
MLLM
ELM
30
75
0
02 Nov 2023
Multimodal Foundation Models for Zero-shot Animal Species Recognition in
  Camera Trap Images
Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images
Zalan Fabian
Zhongqi Miao
Chunyuan Li
Yuanhan Zhang
Ziwei Liu
...
Laura Siabatto
Andrés Link
Pablo Arbelaez
Rahul Dodhia
J. L. Ferres
44
10
0
02 Nov 2023
FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models
FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models
Liqiang Jing
Ruosen Li
Yunmo Chen
Mengzhao Jia
Xinya Du
MLLM
21
7
0
02 Nov 2023
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
Yifan Du
Hangyu Guo
Kun Zhou
Wayne Xin Zhao
Jinpeng Wang
Chuyuan Wang
Mingchen Cai
Ruihua Song
Ji-Rong Wen
VLM
MLLM
LRM
72
22
0
02 Nov 2023
Emotion Detection for Misinformation: A Review
Emotion Detection for Misinformation: A Review
Zhiwei Liu
Tianlin Zhang
Kailai Yang
Paul Thompson
Zeping Yu
Sophia Ananiadou
20
28
0
01 Nov 2023
De-Diffusion Makes Text a Strong Cross-Modal Interface
De-Diffusion Makes Text a Strong Cross-Modal Interface
Chen Wei
Chenxi Liu
Siyuan Qiao
Zhishuai Zhang
Alan Yuille
Jiahui Yu
VLM
DiffM
34
10
0
01 Nov 2023
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation,
  Generation and Editing
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Wei-Ge Chen
Irina Spiridonova
Jianwei Yang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
13
34
0
01 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering
  (VQA) Approaches, Challenges, and Opportunities
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
37
36
0
01 Nov 2023
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Ruihang Lai
Junru Shao
Siyuan Feng
Steven Lyubomirsky
Bohan Hou
...
Sunghyun Park
Prakalp Srivastava
Jared Roesch
T. Mowry
Tianqi Chen
47
9
0
01 Nov 2023
CapsFusion: Rethinking Image-Text Data at Scale
CapsFusion: Rethinking Image-Text Data at Scale
Qiying Yu
Quan-Sen Sun
Xiaosong Zhang
Yufeng Cui
Fan Zhang
Yue Cao
Xinlong Wang
Jingjing Liu
VLM
23
54
0
31 Oct 2023
GG-LLM: Geometrically Grounding Large Language Models for Zero-shot
  Human Activity Forecasting in Human-Aware Task Planning
GG-LLM: Geometrically Grounding Large Language Models for Zero-shot Human Activity Forecasting in Human-Aware Task Planning
Moritz Graule
Volkan Isler
LM&Ro
24
14
0
30 Oct 2023
Emotional Theory of Mind: Bridging Fast Visual Processing with Slow
  Linguistic Reasoning
Emotional Theory of Mind: Bridging Fast Visual Processing with Slow Linguistic Reasoning
Yasaman Etesam
Özge Nilay Yalçin
Chuxuan Zhang
Angelica Lim
32
2
0
30 Oct 2023
Audio-Visual Instance Segmentation
Audio-Visual Instance Segmentation
Ruohao Guo
Yaru Chen
Yanyu Qi
Wenzhen Yue
Dantong Niu
...
Wenzhen Yue
Ji Shi
Qixun Wang
Peiliang Zhang
Buwen Liang
VLM
VOS
31
2
0
28 Oct 2023
Image Clustering Conditioned on Text Criteria
Image Clustering Conditioned on Text Criteria
Sehyun Kwon
Jaeseung Park
Minkyu Kim
Jaewoong Cho
Ernest K. Ryu
Kangwook Lee
VLM
36
11
0
27 Oct 2023
ControlLLM: Augment Language Models with Tools by Searching on Graphs
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Zhaoyang Liu
Zeqiang Lai
Zhangwei Gao
Erfei Cui
Ziheng Li
...
Lewei Lu
Qifeng Chen
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
126
30
0
26 Oct 2023
A Survey on Transferability of Adversarial Examples across Deep Neural
  Networks
A Survey on Transferability of Adversarial Examples across Deep Neural Networks
Jindong Gu
Xiaojun Jia
Pau de Jorge
Wenqain Yu
Xinwei Liu
...
Anjun Hu
Ashkan Khakzar
Zhijiang Li
Xiaochun Cao
Philip H. S. Torr
AAML
29
26
0
26 Oct 2023
AntifakePrompt: Prompt-Tuned Vision-Language Models are Fake Image
  Detectors
AntifakePrompt: Prompt-Tuned Vision-Language Models are Fake Image Detectors
You-Ming Chang
Chen Yeh
Wei-Chen Chiu
Ning Yu
VPVLM
VLM
78
22
0
26 Oct 2023
Previous
123...575859...636465
Next