Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.01278
Cited By
VPGTrans: Transfer Visual Prompt Generator across LLMs
2 May 2023
Ao Zhang
Hao Fei
Yuan Yao
Wei Ji
Li Li
Zhiyuan Liu
Tat-Seng Chua
MLLM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VPGTrans: Transfer Visual Prompt Generator across LLMs"
50 / 76 papers shown
Title
Treble Counterfactual VLMs: A Causal Approach to Hallucination
Li Li
Jiashu Qu
Yuxiao Zhou
Yuehan Qin
Tiankai Yang
Yue Zhao
81
1
0
08 Mar 2025
Vision Language Models in Medicine
Beria Chingnabe Kalpelbe
Angel Gabriel Adaambiik
Wei Peng
VLM
LM&MA
86
2
0
24 Feb 2025
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Hao Fei
Shengqiong Wu
H. Zhang
Tat-Seng Chua
Shuicheng Yan
59
37
0
31 Dec 2024
Large Language Model-Enhanced Reinforcement Learning for Generic Bus Holding Control Strategies
Jiajie Yu
Yuhong Wang
Wei Ma
OffRL
34
1
0
14 Oct 2024
Grounding is All You Need? Dual Temporal Grounding for Video Dialog
You Qin
Wei Ji
Xinze Lan
Hao Fei
Xun Yang
Dan Guo
Roger Zimmermann
Lizi Liao
VGen
41
0
0
08 Oct 2024
See Where You Read with Eye Gaze Tracking and Large Language Model
Sikai Yang
Gang Yan
Wan Du
33
0
0
28 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
34
1
0
19 Sep 2024
Visual Prompting in Multimodal Large Language Models: A Survey
Junda Wu
Zhehao Zhang
Yu Xia
Xintong Li
Zhaoyang Xia
...
Subrata Mitra
Dimitris N. Metaxas
Lina Yao
Jingbo Shang
Julian McAuley
VLM
LRM
44
12
0
05 Sep 2024
Instruction Tuning-free Visual Token Complement for Multimodal LLMs
Dongsheng Wang
Jiequan Cui
Miaoge Li
Wang Lin
Bo Chen
Hanwang Zhang
MLLM
34
3
0
09 Aug 2024
A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks
Jiaqi Wang
Hanqi Jiang
Yi-Hsueh Liu
Chong Ma
Xu-Yao Zhang
...
Xin Zhang
Wei Zhang
Dinggang Shen
Tianming Liu
Shu Zhang
VLM
AI4TS
42
30
0
02 Aug 2024
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models
Junda Wu
Xintong Li
Tong Yu
Yu-Xiang Wang
Xiang Chen
Jiuxiang Gu
Lina Yao
Jingbo Shang
Julian McAuley
37
0
0
29 Jul 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
59
31
0
07 Jun 2024
Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior
Shuyu Cheng
Yibo Miao
Yinpeng Dong
Xiao Yang
Xiao-Shan Gao
Jun Zhu
AAML
27
3
0
29 May 2024
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
Yue Zhang
Hehe Fan
Yi Yang
43
3
0
24 May 2024
Relay Decoding: Concatenating Large Language Models for Machine Translation
Chengpeng Fu
Xiaocheng Feng
Yi-Chong Huang
Wenshuai Huo
Baohang Li
Hui Wang
Bing Qin
Ting Liu
24
0
0
05 May 2024
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
80
139
0
29 Apr 2024
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Bohao Li
Yuying Ge
Yi Chen
Yixiao Ge
Ruimao Zhang
Ying Shan
VLM
30
39
0
25 Apr 2024
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning
Yian Li
Wentao Tian
Yang Jiao
Jingjing Chen
Yueping Jiang
Bin Zhu
Na Zhao
Yu-Gang Jiang
LRM
38
9
0
19 Apr 2024
Exploring the Transferability of Visual Prompting for Multimodal Large Language Models
Yichi Zhang
Yinpeng Dong
Siyuan Zhang
Tianzan Min
Hang Su
Jun Zhu
LRM
VLM
44
5
0
17 Apr 2024
Koala: Key frame-conditioned long video-LLM
Reuben Tan
Ximeng Sun
Ping Hu
Jui-hsien Wang
Hanieh Deilamsalehy
Bryan A. Plummer
Bryan C. Russell
Kate Saenko
38
35
0
05 Apr 2024
RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Zhipeng Huang
Zhizheng Zhang
Zheng-Jun Zha
Yan Lu
Baining Guo
VLM
36
3
0
19 Mar 2024
OSCaR: Object State Captioning and State Change Representation
Nguyen Nguyen
Jing Bi
A. Vosoughi
Yapeng Tian
Pooyan Fazli
Chenliang Xu
40
8
0
27 Feb 2024
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
Yutao Hu
Tian-Xin Li
Quanfeng Lu
Wenqi Shao
Junjun He
Yu Qiao
Ping Luo
ELM
LM&MA
29
50
0
14 Feb 2024
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering
Ziyu Ma
Shutao Li
Bin Sun
Jianfei Cai
Zuxiang Long
Fuyan Ma
21
1
0
04 Feb 2024
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
Fanqing Meng
Wenqi Shao
Quanfeng Lu
Peng Gao
Kaipeng Zhang
Yu Qiao
Ping Luo
27
45
0
04 Jan 2024
Generative Multimodal Models are In-Context Learners
Quan-Sen Sun
Yufeng Cui
Xiaosong Zhang
Fan Zhang
Qiying Yu
...
Yueze Wang
Yongming Rao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
LRM
45
244
0
20 Dec 2023
VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering
Chun-Mei Feng
Yang Bai
Tao Luo
Zhen Li
Salman Khan
Wangmeng Zuo
Xinxing Xu
Rick Siow Mong Goh
Yong-Jin Liu
27
5
0
19 Dec 2023
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
Yi Chen
Yuying Ge
Yixiao Ge
Mingyu Ding
Bohao Li
Rui Wang
Rui-Lan Xu
Ying Shan
Xihui Liu
LLMAG
ELM
LRM
19
9
0
11 Dec 2023
SEED-Bench-2: Benchmarking Multimodal Large Language Models
Bohao Li
Yuying Ge
Yixiao Ge
Guangzhi Wang
Rui Wang
Ruimao Zhang
Ying Shan
MLLM
VLM
23
66
0
28 Nov 2023
Large Language Models as Automated Aligners for benchmarking Vision-Language Models
Yuanfeng Ji
Chongjian Ge
Weikai Kong
Enze Xie
Zhengying Liu
Zhengguo Li
Ping Luo
MLLM
ELM
28
7
0
24 Nov 2023
MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning
Fuxiao Liu
Xiaoyang Wang
Wenlin Yao
Jianshu Chen
Kaiqiang Song
Sangwoo Cho
Yaser Yacoob
Dong Yu
21
98
0
15 Nov 2023
NExT-Chat: An LMM for Chat, Detection and Segmentation
Ao Zhang
Yuan Yao
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
MLLM
VLM
40
51
0
08 Nov 2023
OtterHD: A High-Resolution Multi-modality Model
Bo-wen Li
Peiyuan Zhang
Jingkang Yang
Yuanhan Zhang
Fanyi Pu
Ziwei Liu
VLM
MLLM
30
65
0
07 Nov 2023
CapsFusion: Rethinking Image-Text Data at Scale
Qiying Yu
Quan-Sen Sun
Xiaosong Zhang
Yufeng Cui
Fan Zhang
Yue Cao
Xinlong Wang
Jingjing Liu
VLM
21
53
0
31 Oct 2023
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Tong Bill Xu
Hao Wang
Dianbo Sui
Yunhang Shen
Ke Li
Xingguo Sun
Enhong Chen
VLM
MLLM
30
113
0
24 Oct 2023
MM-BigBench: Evaluating Multimodal Models on Multimodal Content Comprehension Tasks
Xiaocui Yang
Wenfang Wu
Shi Feng
Ming Wang
Daling Wang
Yang Li
Qi Sun
Yifei Zhang
Xiaoming Fu
Soujanya Poria
LRM
ELM
25
10
0
13 Oct 2023
Can We Edit Multimodal Large Language Models?
Siyuan Cheng
Bo Tian
Qingbin Liu
Xi Chen
Yongheng Wang
Huajun Chen
Ningyu Zhang
MLLM
28
28
0
12 Oct 2023
Domain-wise Invariant Learning for Panoptic Scene Graph Generation
Li Li
Youxuan Qin
Wei Ji
Yuxiao Zhou
Roger Zimmermann
27
4
0
09 Oct 2023
Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling
Haogeng Liu
Qihang Fan
Tingkai Liu
Linjie Yang
Yunzhe Tao
Huaibo Huang
Ran He
Hongxia Yang
VGen
21
12
0
08 Oct 2023
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction
Yiren Jian
Tingkai Liu
Yunzhe Tao
Chunhui Zhang
Soroush Vosoughi
HX Yang
VLM
15
7
0
05 Oct 2023
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
Haozhe Zhao
Zefan Cai
Shuzheng Si
Xiaojian Ma
Kaikai An
Liang Chen
Zixuan Liu
Sheng Wang
Wenjuan Han
Baobao Chang
MLLM
VLM
24
132
0
14 Sep 2023
NExT-GPT: Any-to-Any Multimodal LLM
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
MLLM
46
448
0
11 Sep 2023
3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation
Changli Wu
Yiwei Ma
Qi Chen
Haowei Wang
Gen Luo
Jiayi Ji
Xiaoshuai Sun
3DV
31
18
0
31 Aug 2023
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on Language, Multimodal, and Scientific GPT Models
Kaiyuan Gao
Su He
Zhenyu He
Jiacheng Lin
Qizhi Pei
Jie Shao
Wei Zhang
LM&MA
SyDa
30
4
0
27 Aug 2023
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Jinyi Hu
Yuan Yao
Chong Wang
Shanonan Wang
Yinxu Pan
...
Yankai Lin
Jiao Xue
Dahai Li
Zhiyuan Liu
Maosong Sun
MLLM
VLM
24
48
0
23 Aug 2023
Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems
Zeinab Taghavi
S. Gooran
Seyed Arshan Dalili
Hamidreza Amirzadeh
Mohammad Jalal Nematbakhsh
Hossein Sameti
18
2
0
20 Aug 2023
Large Language Models and Foundation Models in Smart Agriculture: Basics, Opportunities, and Challenges
Jiajia Li
Mingle Xu
Lirong Xiang
Dong Chen
Weichao Zhuang
Xunyuan Yin
Zhao Li
25
3
0
13 Aug 2023
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation
Leigang Qu
Shengqiong Wu
Hao Fei
Liqiang Nie
Tat-Seng Chua
LM&Ro
DiffM
MLLM
35
88
0
09 Aug 2023
Constructing Holistic Spatio-Temporal Scene Graph for Video Semantic Role Labeling
Yu Zhao
Hao Fei
Yixin Cao
Bobo Li
Meishan Zhang
Jianguo Wei
M. Zhang
Tat-Seng Chua
17
13
0
09 Aug 2023
Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition
Bobo Li
Hao Fei
Lizi Liao
Yu Zhao
Chong Teng
Tat-Seng Chua
Donghong Ji
Fei Li
19
30
0
08 Aug 2023
1
2
Next