Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.08485
Cited By
Visual Instruction Tuning
17 April 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Instruction Tuning"
50 / 3,278 papers shown
Title
Just Say the Name: Online Continual Learning with Category Names Only via Data Generation
Minhyuk Seo
Diganta Misra
Seongwon Cho
Minjae Lee
Jonghyun Choi
CLL
44
7
0
16 Mar 2024
GAgent: An Adaptive Rigid-Soft Gripping Agent with Vision Language Models for Complex Lighting Environments
Zhuowei Li
Miao Zhang
Xiaotian Lin
Meng Yin
Shuai Lu
Xueqian Wang
45
6
0
16 Mar 2024
Giving a Hand to Diffusion Models: a Two-Stage Approach to Improving Conditional Human Image Generation
Anton Pelykh
Ozge Mercanoglu
Richard Bowden
DiffM
39
7
0
15 Mar 2024
IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation
Yizhi Song
Zhifei Zhang
Zhe Lin
Scott D. Cohen
Brian L. Price
Jianming Zhang
Soo Ye Kim
He Zhang
Wei Xiong
Daniel G. Aliaga
DiffM
76
36
0
15 Mar 2024
Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond
Tianxin Wei
Bowen Jin
Ruirui Li
Hansi Zeng
Zhengyang Wang
...
Qingyu Yin
Hanqing Lu
Suhang Wang
Jingrui He
Xianfeng Tang
60
17
0
15 Mar 2024
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Xiaohan Wang
Yuhui Zhang
Orr Zohar
Serena Yeung-Levy
VLM
124
86
0
15 Mar 2024
Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning
Dongmin Park
Zhaofang Qian
Guangxing Han
Ser-Nam Lim
MLLM
48
0
0
15 Mar 2024
HawkEye: Training Video-Text LLMs for Grounding Text in Videos
Yueqian Wang
Xiaojun Meng
Jianxin Liang
Yuxuan Wang
Qun Liu
Dongyan Zhao
38
30
0
15 Mar 2024
Generative Region-Language Pretraining for Open-Ended Object Detection
Chuang Lin
Yi Jiang
Lizhen Qu
Zehuan Yuan
Jianfei Cai
ObjD
VLM
53
13
0
15 Mar 2024
Improving Medical Multi-modal Contrastive Learning with Expert Annotations
Yogesh Kumar
Pekka Marttinen
MedIm
VLM
38
10
0
15 Mar 2024
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
Yifan Li
Hangyu Guo
Kun Zhou
Wayne Xin Zhao
Ji-Rong Wen
61
40
0
14 Mar 2024
3D-VLA: A 3D Vision-Language-Action Generative World Model
Haoyu Zhen
Xiaowen Qiu
Peihao Chen
Jincheng Yang
Xin Yan
Yilun Du
Yining Hong
Chuang Gan
LM&Ro
VGen
PINN
44
92
0
14 Mar 2024
Generalized Predictive Model for Autonomous Driving
Jiazhi Yang
Shenyuan Gao
Yihang Qiu
Li Chen
Tianyu Li
...
Ping Luo
Jun Zhang
Andreas Geiger
Yu Qiao
Hongyang Li
VGen
73
58
0
14 Mar 2024
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Zeyu Liu
Weicong Liang
Zhanhao Liang
Chong Luo
Ji Li
Gao Huang
Yuhui Yuan
DiffM
72
26
0
14 Mar 2024
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
Yunhao Gou
Kai Chen
Zhili Liu
Lanqing Hong
Hang Xu
Zhenguo Li
Dit-Yan Yeung
James T. Kwok
Yu Zhang
MLLM
50
42
0
14 Mar 2024
Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing
Wonjun Kang
Kevin Galim
Hyung Il Koo
DiffM
39
5
0
14 Mar 2024
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Haiyang Wang
Hao Tang
Li Jiang
Shaoshuai Shi
Muhammad Ferjad Naeem
Hongsheng Li
Bernt Schiele
Liwei Wang
VLM
51
10
0
14 Mar 2024
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Yufei Zhan
Yousong Zhu
Hongyin Zhao
Fan Yang
Ming Tang
Jinqiao Wang
ObjD
44
12
0
14 Mar 2024
UniCode: Learning a Unified Codebook for Multimodal Large Language Models
Sipeng Zheng
Bohan Zhou
Yicheng Feng
Ye Wang
Zongqing Lu
VLM
MLLM
46
7
0
14 Mar 2024
The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?
Qinyu Zhao
Ming Xu
Kartik Gupta
Akshay Asthana
Liang Zheng
Stephen Gould
34
8
0
14 Mar 2024
Large Language Models and Causal Inference in Collaboration: A Survey
Xiaoyu Liu
Paiheng Xu
Junda Wu
Jiaxin Yuan
Yifan Yang
...
Haoliang Wang
Tong Yu
Julian McAuley
Wei Ai
Furong Huang
ELM
LRM
80
34
0
14 Mar 2024
PathM3: A Multimodal Multi-Task Multiple Instance Learning Framework for Whole Slide Image Classification and Captioning
Qifeng Zhou
Wenliang Zhong
Yuzhi Guo
Michael Xiao
Hehuan Ma
Junzhou Huang
54
10
0
13 Mar 2024
Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation
Zicheng Zhang
Tong Zhang
Yi Zhu
Jian-zhuo Liu
Xiaodan Liang
QiXiang Ye
Wei Ke
VLM
57
2
0
13 Mar 2024
CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model
Cheng Chen
Sitong Su
Xu Luo
Hengtao Shen
Lianli Gao
Jingkuan Song
CLL
42
13
0
13 Mar 2024
Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation
Zhonghan Zhao
Kewei Chen
Dongxu Guo
Wenhao Chai
Tianbo Ye
Yanting Zhang
Gaoang Wang
64
21
0
13 Mar 2024
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
Minbin Huang
Yanxin Long
Xinchi Deng
Ruihang Chu
Jiangfeng Xiong
Xiaodan Liang
Hong Cheng
Qinglin Lu
Wei Liu
MLLM
EGVM
65
8
0
13 Mar 2024
Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation
Juan Manuel Zambrano Chaves
Shih-Cheng Huang
Yanbo Xu
Hanwen Xu
Naoto Usuyama
...
Akshay S. Chaudhari
Serena Yeung-Levy
Curtis P. Langlotz
Sheng Wang
Hoifung Poon
VLM
LM&MA
73
10
0
12 Mar 2024
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
Lei Zhu
Fangyun Wei
Yanye Lu
MLLM
VLM
57
18
0
12 Mar 2024
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Byung-Kwan Lee
Beomchan Park
Chae Won Kim
Yonghyun Ro
MLLM
VLM
53
20
0
12 Mar 2024
VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark
Han Huang
Haitian Zhong
Tao Yu
Qiang Liu
Shu Wu
Liang Wang
Tien-Ping Tan
VLM
KELM
33
8
0
12 Mar 2024
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
Yang Jiao
Shaoxiang Chen
Zequn Jie
Wenke Huang
Lin Ma
Yueping Jiang
MLLM
50
18
0
12 Mar 2024
AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production
Jiuniu Wang
Zehua Du
Yuyuan Zhao
Bo Yuan
Kexiang Wang
...
Yihen Lu
Gengliang Li
Junlong Gao
Xin Tu
Zhenyu Guo
LLMAG
VGen
41
7
0
12 Mar 2024
Materials science in the era of large language models: a perspective
Ge Lei
Ronan Docherty
Samuel J. Cooper
50
18
0
11 Mar 2024
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Liang Chen
Haozhe Zhao
Tianyu Liu
Shuai Bai
Junyang Lin
Chang Zhou
Baobao Chang
MLLM
VLM
59
122
0
11 Mar 2024
Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning
Zijian Zhou
Miaojing Shi
Meng Wei
Oluwatosin O. Alabi
Zijie Yue
Tom Kamiel Magda Vercauteren
LM&MA
43
6
0
11 Mar 2024
Can LLMs' Tuning Methods Work in Medical Multimodal Domain?
Jiawei Chen
Yue Jiang
Dingkang Yang
Mingcheng Li
Jinjie Wei
Ziyun Qian
Lihua Zhang
LM&MA
32
9
0
11 Mar 2024
Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource Languages
Michael Andersland
27
0
0
11 Mar 2024
Transformer based Multitask Learning for Image Captioning and Object Detection
Debolena Basak
P. K. Srijith
M. Desarkar
32
1
0
10 Mar 2024
Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language Models
Minjie Zhu
Yichen Zhu
Xin Liu
Ning Liu
Zhiyuan Xu
Yaxin Peng
Chaomin Shen
Zhicai Ou
Feifei Feng
Jian Tang
VLM
57
20
0
10 Mar 2024
Reframe Anything: LLM Agent for Open World Video Reframing
Jiawang Cao
Yongliang Wu
Weiheng Chi
Wenbo Zhu
Ziyue Su
Jay Wu
42
3
0
10 Mar 2024
Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos
Tarun Kalluri
Bodhisattwa Prasad Majumder
Manmohan Chandraker
VLM
47
4
0
08 Mar 2024
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Haoyu Lu
Wen Liu
Bo Zhang
Bing-Li Wang
Kai Dong
...
Yaofeng Sun
Chengqi Deng
Hanwei Xu
Zhenda Xie
Chong Ruan
VLM
41
304
0
08 Mar 2024
VLM-PL: Advanced Pseudo Labeling Approach for Class Incremental Object Detection via Vision-Language Model
Junsu Kim
Yunhoe Ku
Jihyeon Kim
Junuk Cha
Seungryul Baek
ObjD
VLM
42
12
0
08 Mar 2024
Debiasing Multimodal Large Language Models
Yi-Fan Zhang
Weichen Yu
Qingsong Wen
Xue Wang
Zhang Zhang
Liang Wang
Rong Jin
Tien-Ping Tan
58
4
0
08 Mar 2024
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Xiwei Hu
Rui Wang
Yixiao Fang
Bin-Bin Fu
Pei Cheng
Gang Yu
VLM
59
72
0
08 Mar 2024
XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution
Yunpeng Qu
Kun Yuan
Kai Zhao
Qizhi Xie
Jinhua Hao
Ming Sun
Chao Zhou
32
17
0
08 Mar 2024
MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation
Zhongwei Wan
Che Liu
Xin Wang
Chaofan Tao
Hui Shen
Zhenwu Peng
Jie Fu
Rossella Arcucci
Huaxiu Yao
Mi Zhang
57
7
0
07 Mar 2024
How Far Are We from Intelligent Visual Deductive Reasoning?
Yizhe Zhang
Richard He Bai
Ruixiang Zhang
Jiatao Gu
Shuangfei Zhai
J. Susskind
Navdeep Jaitly
ReLM
LRM
54
13
0
07 Mar 2024
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Junsong Chen
Chongjian Ge
Enze Xie
Yue Wu
Lewei Yao
Xiaozhe Ren
Zhongdao Wang
Ping Luo
Huchuan Lu
Zhenguo Li
141
90
0
07 Mar 2024
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
150
512
0
07 Mar 2024
Previous
1
2
3
...
48
49
50
...
64
65
66
Next