ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.08485
  4. Cited By
Visual Instruction Tuning

Visual Instruction Tuning

17 April 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
    SyDa
    VLM
    MLLM
ArXivPDFHTML

Papers citing "Visual Instruction Tuning"

50 / 3,278 papers shown
Title
Towards Language-Driven Video Inpainting via Multimodal Large Language
  Models
Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Jianzong Wu
Xiangtai Li
Chenyang Si
Shangchen Zhou
Jingkang Yang
...
Yining Li
Kai Chen
Yunhai Tong
Ziwei Liu
Chen Change Loy
VGen
DiffM
MLLM
41
17
0
18 Jan 2024
Supervised Fine-tuning in turn Improves Visual Foundation Models
Supervised Fine-tuning in turn Improves Visual Foundation Models
Xiaohu Jiang
Yixiao Ge
Yuying Ge
Dachuan Shi
Chun Yuan
Ying Shan
VLM
CLIP
46
8
0
18 Jan 2024
MM-Interleaved: Interleaved Image-Text Generative Modeling via
  Multi-modal Feature Synchronizer
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Changyao Tian
Xizhou Zhu
Yuwen Xiong
Weiyun Wang
Zhe Chen
...
Tong Lu
Jie Zhou
Hongsheng Li
Yu Qiao
Jifeng Dai
AuLLM
85
43
0
18 Jan 2024
Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and
  Visual Question Generation
Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation
Kohei Uehara
Nabarun Goswami
Hanqin Wang
Toshiaki Baba
Kohtaro Tanaka
...
Takagi Naoya
Ryo Umagami
Yingyi Wen
Tanachai Anakewat
Tatsuya Harada
LRM
28
2
0
18 Jan 2024
Veagle: Advancements in Multimodal Representation Learning
Veagle: Advancements in Multimodal Representation Learning
Rajat Chawla
Arkajit Datta
Tushar Verma
Adarsh Jha
Anmol Gautam
Ayush Vatsal
Sukrit Chaterjee
NS Mukunda
Ishaan Bhola
VLM
21
4
0
18 Jan 2024
Temporal Insight Enhancement: Mitigating Temporal Hallucination in
  Multimodal Large Language Models
Temporal Insight Enhancement: Mitigating Temporal Hallucination in Multimodal Large Language Models
Li Sun
Liuan Wang
Jun Sun
Takayuki Okatani
MLLM
19
0
0
18 Jan 2024
SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction
  Tuning with Large Language Model
SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model
Yangfan Zhan
Zhitong Xiong
Yuan. Yuan
MLLM
80
41
0
18 Jan 2024
Vision Mamba: Efficient Visual Representation Learning with
  Bidirectional State Space Model
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu
Bencheng Liao
Qian Zhang
Xinlong Wang
Wenyu Liu
Xinggang Wang
Mamba
50
719
0
17 Jan 2024
Vlogger: Make Your Dream A Vlog
Vlogger: Make Your Dream A Vlog
Shaobin Zhuang
Kunchang Li
Xinyuan Chen
Yaohui Wang
Ziwei Liu
Yu Qiao
Yali Wang
VGen
DiffM
43
35
0
17 Jan 2024
Beyond Anti-Forgetting: Multimodal Continual Instruction Tuning with
  Positive Forward Transfer
Beyond Anti-Forgetting: Multimodal Continual Instruction Tuning with Positive Forward Transfer
Junhao Zheng
Qianli Ma
Zhen Liu
Binquan Wu
Hu Feng
CLL
37
14
0
17 Jan 2024
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in
  3D World
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Yining Hong
Zishuo Zheng
Peihao Chen
Yian Wang
Junyan Li
Chuang Gan
26
33
0
16 Jan 2024
MMToM-QA: Multimodal Theory of Mind Question Answering
MMToM-QA: Multimodal Theory of Mind Question Answering
Chuanyang Jin
Yutong Wu
Jing Cao
Jiannan Xiang
Yen-Ling Kuo
Zhiting Hu
T. Ullman
Antonio Torralba
Joshua B. Tenenbaum
Tianmin Shu
42
34
0
16 Jan 2024
AesBench: An Expert Benchmark for Multimodal Large Language Models on
  Image Aesthetics Perception
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception
Yipo Huang
Quan Yuan
Xiangfei Sheng
Zhichao Yang
Haoning Wu
Pengfei Chen
Yuzhe Yang
Leida Li
Weisi Lin
VLM
24
38
0
16 Jan 2024
Towards A Better Metric for Text-to-Video Generation
Towards A Better Metric for Text-to-Video Generation
Jay Zhangjie Wu
Guian Fang
Haoning Wu
Xintao Wang
Yixiao Ge
...
Rui Zhao
Weisi Lin
Wynne Hsu
Ying Shan
Mike Zheng Shou
VGen
42
34
0
15 Jan 2024
Survey of Natural Language Processing for Education: Taxonomy,
  Systematic Review, and Future Trends
Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future Trends
Yunshi Lan
Xinyuan Li
Hanyue Du
Xuesong Lu
Ming Gao
Weining Qian
Aoying Zhou
45
2
0
15 Jan 2024
PDE Generalization of In-Context Operator Networks: A Study on 1D Scalar
  Nonlinear Conservation Laws
PDE Generalization of In-Context Operator Networks: A Study on 1D Scalar Nonlinear Conservation Laws
Liu Yang
Stanley J. Osher
AI4CE
48
19
0
14 Jan 2024
Generalizing Visual Question Answering from Synthetic to Human-Written
  Questions via a Chain of QA with a Large Language Model
Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model
Taehee Kim
Yeongjae Cho
Heejun Shin
Yohan Jo
Dongmyung Shin
37
4
0
12 Jan 2024
ModaVerse: Efficiently Transforming Modalities with LLMs
ModaVerse: Efficiently Transforming Modalities with LLMs
Xinyu Wang
Bohan Zhuang
Qi Wu
14
11
0
12 Jan 2024
AffordanceLLM: Grounding Affordance from Vision Language Models
AffordanceLLM: Grounding Affordance from Vision Language Models
Shengyi Qian
Weifeng Chen
Min Bai
Xiong Zhou
Zhuowen Tu
Li Erran Li
25
21
0
12 Jan 2024
Distilling Vision-Language Models on Millions of Videos
Distilling Vision-Language Models on Millions of Videos
Yue Zhao
Long Zhao
Xingyi Zhou
Jialin Wu
Chun-Te Chu
...
Hartwig Adam
Ting Liu
Boqing Gong
Philipp Krahenbuhl
Liangzhe Yuan
VLM
41
13
0
11 Jan 2024
Hallucination Benchmark in Medical Visual Question Answering
Hallucination Benchmark in Medical Visual Question Answering
Jinge Wu
Yunsoo Kim
Honghan Wu
25
9
0
11 Jan 2024
An EcoSage Assistant: Towards Building A Multimodal Plant Care Dialogue
  Assistant
An EcoSage Assistant: Towards Building A Multimodal Plant Care Dialogue Assistant
Mohit Tomar
Abhisek Tiwari
Tulika Saha
Prince Jha
Sriparna Saha
22
1
0
10 Jan 2024
MISS: A Generative Pretraining and Finetuning Approach for Med-VQA
MISS: A Generative Pretraining and Finetuning Approach for Med-VQA
Jiawei Chen
Dingkang Yang
Yue Jiang
Yuxuan Lei
Lihua Zhang
LM&MA
MedIm
21
13
0
10 Jan 2024
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence
  Lengths in Large Language Models
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Zhen Qin
Weigao Sun
Dong Li
Xuyang Shen
Weixuan Sun
Yiran Zhong
72
22
0
09 Jan 2024
Uni3D-LLM: Unifying Point Cloud Perception, Generation and Editing with
  Large Language Models
Uni3D-LLM: Unifying Point Cloud Perception, Generation and Editing with Large Language Models
Dingning Liu
Xiaoshui Huang
Yuenan Hou
Zhihui Wang
Zhen-fei Yin
Yongshun Gong
Peng Gao
Wanli Ouyang
29
8
0
09 Jan 2024
Language-Conditioned Robotic Manipulation with Fast and Slow Thinking
Language-Conditioned Robotic Manipulation with Fast and Slow Thinking
Minjie Zhu
Yichen Zhu
Jinming Li
Junjie Wen
Zhiyuan Xu
...
Yaxin Peng
Chaomin Shen
Dong Liu
Feifei Feng
Jian Tang
LM&Ro
40
15
0
08 Jan 2024
GRAM: Global Reasoning for Multi-Page VQA
GRAM: Global Reasoning for Multi-Page VQA
Tsachi Blau
Sharon Fogel
Roi Ronen
Alona Golts
Roy Ganz
Elad Ben Avraham
Aviad Aberdam
Shahar Tsiper
Ron Litman
22
12
0
07 Jan 2024
VLLaVO: Mitigating Visual Gap through LLMs
VLLaVO: Mitigating Visual Gap through LLMs
Shuhao Chen
Yulong Zhang
Weisen Jiang
Jiangang Lu
Yu Zhang
VLM
56
2
0
06 Jan 2024
3DMIT: 3D Multi-modal Instruction Tuning for Scene Understanding
3DMIT: 3D Multi-modal Instruction Tuning for Scene Understanding
Zeju Li
Chao Zhang
Xiaoyan Wang
Ruilong Ren
Yifan Xu
Ruifei Ma
Xiangde Liu
MLLM
30
20
0
06 Jan 2024
CaMML: Context-Aware Multimodal Learner for Large Models
CaMML: Context-Aware Multimodal Learner for Large Models
Yixin Chen
Shuai Zhang
Boran Han
Tong He
Bo Li
VLM
32
4
0
06 Jan 2024
Incorporating Visual Experts to Resolve the Information Loss in
  Multimodal Large Language Models
Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models
Xin He
Longhui Wei
Lingxi Xie
Qi Tian
43
8
0
06 Jan 2024
AccidentGPT: Large Multi-Modal Foundation Model for Traffic Accident
  Analysis
AccidentGPT: Large Multi-Modal Foundation Model for Traffic Accident Analysis
Kebin Wu
Wenbin Li
Xiaofei Xiao
21
3
0
05 Jan 2024
Object-Centric Instruction Augmentation for Robotic Manipulation
Object-Centric Instruction Augmentation for Robotic Manipulation
Junjie Wen
Yichen Zhu
Minjie Zhu
Jinming Li
Zhiyuan Xu
...
Yaxin Peng
Chaomin Shen
Dong Liu
Feifei Feng
Jian Tang
LM&Ro
69
16
0
05 Jan 2024
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal
  Models with Multiple Image Inputs
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
Daoan Zhang
Junming Yang
Hanjia Lyu
Zijian Jin
Yuan Yao
Mingkai Chen
Jiebo Luo
46
34
0
05 Jan 2024
ChartAssisstant: A Universal Chart Multimodal Language Model via
  Chart-to-Table Pre-training and Multitask Instruction Tuning
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
Fanqing Meng
Wenqi Shao
Quanfeng Lu
Peng Gao
Kaipeng Zhang
Yu Qiao
Ping Luo
34
46
0
04 Jan 2024
LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
Yichen Zhu
Minjie Zhu
Ning Liu
Zhicai Ou
Xiaofeng Mou
Jian Tang
74
94
0
04 Jan 2024
Improved Zero-Shot Classification by Adapting VLMs with Text
  Descriptions
Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions
Oindrila Saha
Grant Van Horn
Subhransu Maji
VLM
47
20
0
04 Jan 2024
Instruct-Imagen: Image Generation with Multi-modal Instruction
Instruct-Imagen: Image Generation with Multi-modal Instruction
Hexiang Hu
Kelvin C. K. Chan
Yu-Chuan Su
Wenhu Chen
Yandong Li
...
Xue Ben
Boqing Gong
William W. Cohen
Ming-Wei Chang
Xuhui Jia
MLLM
46
43
0
03 Jan 2024
A Vision Check-up for Language Models
A Vision Check-up for Language Models
Pratyusha Sharma
Tamar Rott Shaham
Manel Baradad
Stephanie Fu
Adrian Rodriguez-Munoz
Shivam Duggal
Phillip Isola
Antonio Torralba
VLM
LRM
78
24
0
03 Jan 2024
Detours for Navigating Instructional Videos
Detours for Navigating Instructional Videos
Kumar Ashutosh
Zihui Xue
Tushar Nagarajan
Kristen Grauman
34
6
0
03 Jan 2024
GPT-4V(ision) is a Generalist Web Agent, if Grounded
GPT-4V(ision) is a Generalist Web Agent, if Grounded
Boyuan Zheng
Boyu Gou
Jihyung Kil
Huan Sun
Yu-Chuan Su
MLLM
VLM
LLMAG
46
220
0
03 Jan 2024
BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving
BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving
Tao Tang
Dafeng Wei
Zhengyu Jia
Tian Gao
Changwei Cai
...
Yixing Zhao
Fu Liu
Xiaodan Liang
Xianpeng Lang
Yang Wang
41
7
0
02 Jan 2024
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected
  Multi-Modal Large Models
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models
Xinpeng Ding
Jinahua Han
Hang Xu
Xiaodan Liang
Wei Zhang
Xiaomeng Li
44
39
0
02 Jan 2024
DocLLM: A layout-aware generative language model for multimodal document
  understanding
DocLLM: A layout-aware generative language model for multimodal document understanding
Dongsheng Wang
Natraj Raman
Mathieu Sibue
Zhiqiang Ma
Petr Babkin
Simerjot Kaur
Yulong Pei
Armineh Nourbakhsh
Xiaomo Liu
VLM
22
53
0
31 Dec 2023
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Hong-ping Hao
Long Zhou
Shujie Liu
Jinyu Li
Shujie Hu
Rui Wang
Furu Wei
34
18
0
30 Dec 2023
Pushing Boundaries: Exploring Zero Shot Object Classification with Large
  Multimodal Models
Pushing Boundaries: Exploring Zero Shot Object Classification with Large Multimodal Models
Ashhadul Islam
Md. Rafiul Biswas
Wajdi Zaghouani
S. Belhaouari
Zubair Shah
VLM
21
3
0
30 Dec 2023
Tracking with Human-Intent Reasoning
Tracking with Human-Intent Reasoning
Jiawen Zhu
Zhi-Qi Cheng
Jun-Yan He
Chenyang Li
Bin Luo
Huchuan Lu
Yifeng Geng
Xuansong Xie
LRM
VOS
42
7
0
29 Dec 2023
An Empirical Study of Scaling Law for OCR
An Empirical Study of Scaling Law for OCR
Miao Rang
Zhenni Bi
Chuanjian Liu
Yunhe Wang
Kai Han
50
6
0
29 Dec 2023
Video Understanding with Large Language Models: A Survey
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
70
84
0
29 Dec 2023
LISA++: An Improved Baseline for Reasoning Segmentation with Large
  Language Model
LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model
Senqiao Yang
Tianyuan Qu
Xin Lai
Zhuotao Tian
Bohao Peng
Shu Liu
Jiaya Jia
VLM
26
28
0
28 Dec 2023
Previous
123...535455...646566
Next