ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.08485
  4. Cited By
Visual Instruction Tuning

Visual Instruction Tuning

17 April 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
    SyDa
    VLM
    MLLM
ArXivPDFHTML

Papers citing "Visual Instruction Tuning"

50 / 3,278 papers shown
Title
Boximator: Generating Rich and Controllable Motions for Video Synthesis
Boximator: Generating Rich and Controllable Motions for Video Synthesis
Jiawei Wang
Yuchen Zhang
Jiaxin Zou
Yan Zeng
Guoqiang Wei
Liping Yuan
Hang Li
DiffM
VGen
35
43
0
02 Feb 2024
Skip \n: A Simple Method to Reduce Hallucination in Large
  Vision-Language Models
Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models
Zongbo Han
Zechen Bai
Haiyang Mei
Qianli Xu
Changqing Zhang
Mike Zheng Shou
VLM
37
7
0
02 Feb 2024
Can MLLMs Perform Text-to-Image In-Context Learning?
Can MLLMs Perform Text-to-Image In-Context Learning?
Yuchen Zeng
Wonjun Kang
Yicong Chen
Hyung Il Koo
Kangwook Lee
MLLM
36
9
0
02 Feb 2024
LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in
  Autonomous Driving
LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving
Daocheng Fu
Wenjie Lei
Licheng Wen
Pinlong Cai
Song Mao
Min Dou
Botian Shi
Yu Qiao
54
28
0
02 Feb 2024
Large Language Models for Time Series: A Survey
Large Language Models for Time Series: A Survey
Xiyuan Zhang
Ranak Roy Chowdhury
Rajesh K. Gupta
Jingbo Shang
AI4TS
90
55
0
02 Feb 2024
A Survey for Foundation Models in Autonomous Driving
A Survey for Foundation Models in Autonomous Driving
Haoxiang Gao
Yaqian Li
Kaiwen Long
Ming Yang
Yiqing Shen
VLM
LRM
58
25
0
02 Feb 2024
BAT: Learning to Reason about Spatial Sounds with Large Language Models
BAT: Learning to Reason about Spatial Sounds with Large Language Models
Zhisheng Zheng
Puyuan Peng
Ziyang Ma
Xie Chen
Eunsol Choi
David Harwath
LRM
35
14
0
02 Feb 2024
Instruction Makes a Difference
Instruction Makes a Difference
Tosin Adewumi
Nudrat Habib
Lama Alkhaled
Elisa Barney
VLM
MLLM
26
1
0
01 Feb 2024
PICS: Pipeline for Image Captioning and Search
PICS: Pipeline for Image Captioning and Search
Grant Rosario
David Noever
199
0
0
01 Feb 2024
A Survey on Hallucination in Large Vision-Language Models
A Survey on Hallucination in Large Vision-Language Models
Hanchao Liu
Wenyuan Xue
Yifei Chen
Dapeng Chen
Xiutian Zhao
Ke Wang
Liping Hou
Rong-Zhi Li
Wei Peng
LRM
MLLM
35
117
0
01 Feb 2024
Enhancing Multimodal Large Language Models with Vision Detection Models:
  An Empirical Study
Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study
Qirui Jiao
Daoyuan Chen
Yilun Huang
Yaliang Li
Ying Shen
30
15
0
31 Jan 2024
ControlCap: Controllable Region-level Captioning
ControlCap: Controllable Region-level Captioning
Yuzhong Zhao
Yue Liu
Zonghao Guo
Weijia Wu
Chen Gong
Fang Wan
QiXiang Ye
26
5
0
31 Jan 2024
PVLR: Prompt-driven Visual-Linguistic Representation Learning for
  Multi-Label Image Recognition
PVLR: Prompt-driven Visual-Linguistic Representation Learning for Multi-Label Image Recognition
Hao Tan
Zichang Tan
Jun Li
Jun Wan
Zhen Lei
VLM
38
0
0
31 Jan 2024
Proximity QA: Unleashing the Power of Multi-Modal Large Language Models
  for Spatial Proximity Analysis
Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis
Jianing Li
Xi Nan
Ming Lu
Li Du
Shanghang Zhang
50
1
0
31 Jan 2024
SwarmBrain: Embodied agent for real-time strategy game StarCraft II via
  large language models
SwarmBrain: Embodied agent for real-time strategy game StarCraft II via large language models
Xiao Shao
Weifu Jiang
Fei Zuo
Mengqing Liu
LLMAG
39
7
0
31 Jan 2024
MouSi: Poly-Visual-Expert Vision-Language Models
MouSi: Poly-Visual-Expert Vision-Language Models
Xiaoran Fan
Tao Ji
Changhao Jiang
Shuo Li
Senjie Jin
...
Qi Zhang
Xipeng Qiu
Xuanjing Huang
Zuxuan Wu
Yunchun Jiang
VLM
32
16
0
30 Jan 2024
Large Language Model Evaluation via Matrix Entropy
Large Language Model Evaluation via Matrix Entropy
Lai Wei
Zhiquan Tan
Chenghai Li
Jindong Wang
Weiran Huang
28
9
0
30 Jan 2024
EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor
  Image Comprehension in Remote Sensing Domain
EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain
Wei Zhang
Miaoxin Cai
Tong Zhang
Zhuang Yin
Xuerui Mao
42
92
0
30 Jan 2024
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Weijiao Zhang
Jindong Han
Zhao Xu
Hang Ni
Hao Liu
Hui Xiong
Hui Xiong
AI4CE
79
15
0
30 Jan 2024
Multi-granularity Correspondence Learning from Long-term Noisy Videos
Multi-granularity Correspondence Learning from Long-term Noisy Videos
Yijie Lin
Jie Zhang
Zhenyu Huang
Jia-Wei Liu
Zujie Wen
Xi Peng
50
18
0
30 Jan 2024
Computer Vision for Primate Behavior Analysis in the Wild
Computer Vision for Primate Behavior Analysis in the Wild
Richard Vogg
Timo Lüddecke
Jonathan Henrich
Sharmita Dey
Matthias Nuske
...
Alexander Gail
Stefan Treue
H. Scherberger
Florentin Wörgötter
Alexander S. Ecker
45
3
0
29 Jan 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and
  Comprehension in Vision-Language Large Model
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Jiaqi Wang
VLM
MLLM
89
245
0
29 Jan 2024
PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding
  and Reasoning in Pathology
PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
Yuxuan Sun
Hao Wu
Chenglu Zhu
Sunyi Zheng
Qizi Chen
...
Mengyue Zheng
Jingxiong Li
Xinheng Lyu
Tao Lin
Lin Yang
LM&MA
24
14
0
29 Jan 2024
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts
  in Instruction Finetuning MLLMs
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs
Shaoxiang Chen
Zequn Jie
Lin Ma
MoE
50
47
0
29 Jan 2024
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual
  Perception
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Junyang Wang
Haiyang Xu
Jiabo Ye
Mingshi Yan
Weizhou Shen
Ji Zhang
Fei Huang
Jitao Sang
52
109
0
29 Jan 2024
Bridging Generative and Discriminative Models for Unified Visual
  Perception with Diffusion Priors
Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors
Shiyin Dong
Mingrui Zhu
Kun Cheng
Nannan Wang
Xinbo Gao
DiffM
30
3
0
29 Jan 2024
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Bin Lin
Zhenyu Tang
Yang Ye
Jiaxi Cui
Bin Zhu
...
Jinfa Huang
Junwu Zhang
Yatian Pang
Munan Ning
Li-ming Yuan
VLM
MLLM
MoE
48
154
0
29 Jan 2024
FengWu-GHR: Learning the Kilometer-scale Medium-range Global Weather
  Forecasting
FengWu-GHR: Learning the Kilometer-scale Medium-range Global Weather Forecasting
Tao Han
Song Guo
Fenghua Ling
Kang Chen
Junchao Gong
Jing-Jia Luo
Junxia Gu
Kan Dai
Wanli Ouyang
Lei Bai
AI4Cl
21
14
0
28 Jan 2024
Baichuan2-Sum: Instruction Finetune Baichuan2-7B Model for Dialogue
  Summarization
Baichuan2-Sum: Instruction Finetune Baichuan2-7B Model for Dialogue Summarization
Jianfei Xiao
Yancan Chen
Yimin Ou
Hanyi Yu
Kai Shu
Yiyong Xiao
ALM
31
11
0
27 Jan 2024
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other
  Modalities
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Yiyuan Zhang
Xiaohan Ding
Kaixiong Gong
Yixiao Ge
Ying Shan
Xiangyu Yue
ViT
24
7
0
25 Jan 2024
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks
Tianhe Ren
Shilong Liu
Ailing Zeng
Jing Lin
Kunchang Li
...
Feng Li
Jie Yang
Hongyang Li
Qing Jiang
Lei Zhang
VLM
51
387
0
25 Jan 2024
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question
  Understanding and Reasoning
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
Zheqi He
Xinya Wu
Pengfei Zhou
Richeng Xuan
Guang Liu
Xi Yang
Qiannan Zhu
Hua Huang
ELM
LRM
30
14
0
25 Jan 2024
Towards 3D Molecule-Text Interpretation in Language Models
Towards 3D Molecule-Text Interpretation in Language Models
Sihang Li
Zhiyuan Liu
Yancheng Luo
Xiang Wang
Xiangnan He
Kenji Kawaguchi
Tat-Seng Chua
Qi Tian
AI4CE
40
43
0
25 Jan 2024
Democratizing Fine-grained Visual Recognition with Large Language Models
Democratizing Fine-grained Visual Recognition with Large Language Models
Mingxuan Liu
Subhankar Roy
Wenjing Li
Zhun Zhong
N. Sebe
Elisa Ricci
VLM
44
10
0
24 Jan 2024
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web
  Tasks
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
Jing Yu Koh
Robert Lo
Lawrence Jang
Vikram Duvvur
Ming Chong Lim
Po-Yu Huang
Graham Neubig
Shuyan Zhou
Ruslan Salakhutdinov
Daniel Fried
23
0
0
24 Jan 2024
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic
  Image Restoration In the Wild
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Fanghua Yu
Jinjin Gu
Zheyuan Li
Jinfan Hu
Xiangtao Kong
Xintao Wang
Jingwen He
Yu Qiao
Chao Dong
36
129
0
24 Jan 2024
ChatterBox: Multi-round Multimodal Referring and Grounding
ChatterBox: Multi-round Multimodal Referring and Grounding
Yunjie Tian
Tianren Ma
Lingxi Xie
Jihao Qiu
Xi Tang
Yuan Zhang
Jianbin Jiao
Qi Tian
Qixiang Ye
33
14
0
24 Jan 2024
MLLMReID: Multimodal Large Language Model-based Person Re-identification
MLLMReID: Multimodal Large Language Model-based Person Re-identification
Shan Yang
Yongfei Zhang
LRM
34
2
0
24 Jan 2024
The Neglected Tails in Vision-Language Models
The Neglected Tails in Vision-Language Models
Shubham Parashar
Zhiqiu Lin
Tian Liu
Xiangjue Dong
Yanan Li
Deva Ramanan
James Caverlee
Shu Kong
VLM
45
33
0
23 Jan 2024
CCA: Collaborative Competitive Agents for Image Editing
CCA: Collaborative Competitive Agents for Image Editing
Tiankai Hang
Shuyang Gu
Dong Chen
Xin Geng
Baining Guo
42
5
0
23 Jan 2024
OK-Robot: What Really Matters in Integrating Open-Knowledge Models for
  Robotics
OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics
Peiqi Liu
Yaswanth Orru
Jay Vakil
Chris Paxton
Nur Muhammad (Mahi) Shafiullah
Lerrel Pinto
LM&Ro
VLM
103
37
0
22 Jan 2024
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning
  Capabilities
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
Boyuan Chen
Zhuo Xu
Sean Kirmani
Brian Ichter
Danny Driess
Pete Florence
Dorsa Sadigh
Leonidas J. Guibas
Fei Xia
LRM
ReLM
52
213
0
22 Jan 2024
The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large
  Language Models
The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models
Kian Ahrabian
Zhivar Sourati
Kexuan Sun
Jiarui Zhang
Yifan Jiang
Fred Morstatter
Jay Pujara
LRM
36
9
0
22 Jan 2024
Benchmarking Large Multimodal Models against Common Corruptions
Benchmarking Large Multimodal Models against Common Corruptions
Jiawei Zhang
Tianyu Pang
Chao Du
Yi Ren
Bo Li
Min Lin
MLLM
40
14
0
22 Jan 2024
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and
  Generating with Multimodal LLMs
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Ling Yang
Zhaochen Yu
Chenlin Meng
Minkai Xu
Stefano Ermon
Bin Cui
CoGe
DiffM
48
116
0
22 Jan 2024
LLMRA: Multi-modal Large Language Model based Restoration Assistant
LLMRA: Multi-modal Large Language Model based Restoration Assistant
Xiaoyu Jin
Yuan Shi
Bin Xia
Wenming Yang
44
4
0
21 Jan 2024
Inducing High Energy-Latency of Large Vision-Language Models with
  Verbose Images
Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images
Kuofeng Gao
Yang Bai
Jindong Gu
Shu-Tao Xia
Philip Torr
Zhifeng Li
Wei Liu
VLM
22
39
0
20 Jan 2024
Q&A Prompts: Discovering Rich Visual Clues through Mining
  Question-Answer Prompts for VQA requiring Diverse World Knowledge
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
Haibi Wang
Weifeng Ge
LRM
32
4
0
19 Jan 2024
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal
  Models for Video Question Answering
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering
Haibo Wang
Chenghang Lai
Yixuan Sun
Weifeng Ge
36
5
0
19 Jan 2024
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
Chenyu Wang
Weixin Luo
Qianyu Chen
Haonan Mai
Jindi Guo
Sixun Dong
Xiaohua Xuan
MLLM
LLMAG
52
19
0
19 Jan 2024
Previous
123...525354...646566
Next