ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.08485
  4. Cited By
Visual Instruction Tuning

Visual Instruction Tuning

17 April 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
    SyDa
    VLM
    MLLM
ArXivPDFHTML

Papers citing "Visual Instruction Tuning"

50 / 3,253 papers shown
Title
Hijacking Context in Large Multi-modal Models
Hijacking Context in Large Multi-modal Models
Joonhyun Jeong
MLLM
44
7
0
07 Dec 2023
Text as Image: Learning Transferable Adapter for Multi-Label
  Classification
Text as Image: Learning Transferable Adapter for Multi-Label Classification
Xueling Zhu
Jiuxin Cao
Jian Liu
Dongqi Tang
Furong Xu
Weijia Liu
Jiawei Ge
Bo Liu
Qingpei Guo
Tianyi Zhang
VLM
33
2
0
07 Dec 2023
VRPTEST: Evaluating Visual Referring Prompting in Large Multimodal
  Models
VRPTEST: Evaluating Visual Referring Prompting in Large Multimodal Models
Zongjie Li
Chaozheng Wang
Chaowei Liu
Pingchuan Ma
Daoyuan Wu
Shuai Wang
Cuiyun Gao
VLM
29
6
0
07 Dec 2023
Auto-Vocabulary Semantic Segmentation
Auto-Vocabulary Semantic Segmentation
Osman Ülger
Maksymilian Kulicki
Yuki M. Asano
Martin R. Oswald
VLM
45
2
0
07 Dec 2023
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction
  Tuning
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
Bolin Lai
Xiaoliang Dai
Lawrence Chen
Guan Pang
James M. Rehg
Miao Liu
41
15
0
06 Dec 2023
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLM
CLIP
51
83
0
06 Dec 2023
Diffusion Illusions: Hiding Images in Plain Sight
Diffusion Illusions: Hiding Images in Plain Sight
R. Burgert
Xiang Li
Abe Leite
Kanchana Ranasinghe
Michael S. Ryoo
52
17
0
06 Dec 2023
Reason2Drive: Towards Interpretable and Chain-based Reasoning for
  Autonomous Driving
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
Ming-Jun Nie
Renyuan Peng
Chunwei Wang
Xinyue Cai
Jianhua Han
Hang Xu
Li Zhang
LRM
34
45
0
06 Dec 2023
Synthesizing Physical Backdoor Datasets: An Automated Framework
  Leveraging Deep Generative Models
Synthesizing Physical Backdoor Datasets: An Automated Framework Leveraging Deep Generative Models
Sze Jue Yang
Chinh D. La
Quang H. Nguyen
Kok-Seng Wong
Anh Tran
Chee Seng Chan
Khoa D. Doan
AAML
21
0
0
06 Dec 2023
On the Robustness of Large Multimodal Models Against Image Adversarial
  Attacks
On the Robustness of Large Multimodal Models Against Image Adversarial Attacks
Xuanimng Cui
Alejandro Aparcedo
Young Kyun Jang
Ser-Nam Lim
AAML
VLM
21
38
0
06 Dec 2023
GPT4Point: A Unified Framework for Point-Language Understanding and
  Generation
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
Zhangyang Qi
Ye Fang
Zeyi Sun
Xiaoyang Wu
Tong Wu
Jiaqi Wang
Dahua Lin
Hengshuang Zhao
MLLM
74
36
0
05 Dec 2023
Describing Differences in Image Sets with Natural Language
Describing Differences in Image Sets with Natural Language
Lisa Dunlap
Yuhui Zhang
Xiaohan Wang
Ruiqi Zhong
Trevor Darrell
Jacob Steinhardt
Joseph E. Gonzalez
Serena Yeung-Levy
CoGe
VLM
32
30
0
05 Dec 2023
Visual Program Distillation: Distilling Tools and Programmatic Reasoning
  into Vision-Language Models
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Yushi Hu
Otilia Stretcu
Chun-Ta Lu
Krishnamurthy Viswanathan
Kenji Hata
Enming Luo
Ranjay Krishna
Ariel Fuxman
VLM
LRM
MLLM
49
29
0
05 Dec 2023
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang
Hongyang Li
Feng Li
Tianhe Ren
Xueyan Zou
...
Shijia Huang
Jianfeng Gao
Lei Zhang
Chun-yue Li
Jianwei Yang
91
68
0
05 Dec 2023
Diversified in-domain synthesis with efficient fine-tuning for few-shot
  classification
Diversified in-domain synthesis with efficient fine-tuning for few-shot classification
Victor G. Turrisi da Costa
Nicola Dall’Asen
Yiming Wang
N. Sebe
Elisa Ricci
46
3
0
05 Dec 2023
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal
  Models
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Rizhao Cai
Zirui Song
Dayan Guan
Zhenhao Chen
Xing Luo
Chenyu Yi
Alex C. Kot
MLLM
VLM
36
31
0
05 Dec 2023
Customization Assistant for Text-to-image Generation
Customization Assistant for Text-to-image Generation
Yufan Zhou
Ruiyi Zhang
Jiuxiang Gu
Tongfei Sun
DiffM
27
11
0
05 Dec 2023
Large Language Models on Graphs: A Comprehensive Survey
Large Language Models on Graphs: A Comprehensive Survey
Bowen Jin
Gang Liu
Chi Han
Meng Jiang
Heng Ji
Jiawei Han
AI4CE
31
138
0
05 Dec 2023
UPOCR: Towards Unified Pixel-Level OCR Interface
UPOCR: Towards Unified Pixel-Level OCR Interface
Dezhi Peng
Zhenhua Yang
Jiaxin Zhang
Chongyu Liu
Yongxin Shi
Kai Ding
Fengjun Guo
Lianwen Jin
31
10
0
05 Dec 2023
Retrieving Conditions from Reference Images for Diffusion Models
Retrieving Conditions from Reference Images for Diffusion Models
Haoran Tang
Xin Zhou
Jieren Deng
Zhihong Pan
Hao Tian
Pratik Chaudhari
31
3
0
05 Dec 2023
Towards More Unified In-context Visual Understanding
Towards More Unified In-context Visual Understanding
Dianmo Sheng
Dongdong Chen
Zhentao Tan
Qiankun Liu
Qi Chu
Jianmin Bao
Tao Gong
Bin Liu
Shengwei Xu
Nenghai Yu
MLLM
VLM
43
10
0
05 Dec 2023
Visual Hindsight Self-Imitation Learning for Interactive Navigation
Visual Hindsight Self-Imitation Learning for Interactive Navigation
Kibeom Kim
Kisung Shin
Min Whoo Lee
Moonhoen Lee
Minsu Lee
Byoung-Tak Zhang
29
2
0
05 Dec 2023
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video
  Grounding with Multimodal Large Language Model
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model
Guozhang Li
Xinpeng Ding
De-Chun Cheng
Jie Li
Nannan Wang
Xinbo Gao
34
1
0
05 Dec 2023
Lenna: Language Enhanced Reasoning Detection Assistant
Lenna: Language Enhanced Reasoning Detection Assistant
Fei Wei
Xinyu Zhang
Ailing Zhang
Bo-Wen Zhang
Xiangxiang Chu
MLLM
LRM
29
23
0
05 Dec 2023
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding
Yizhou Wang
Ruiyi Zhang
Haoliang Wang
Uttaran Bhattacharya
Yun Fu
Gang Wu
MLLM
35
10
0
04 Dec 2023
Rejuvenating image-GPT as Strong Visual Representation Learners
Rejuvenating image-GPT as Strong Visual Representation Learners
Sucheng Ren
Zeyu Wang
Hongru Zhu
Junfei Xiao
Alan L. Yuille
Cihang Xie
VLM
57
7
0
04 Dec 2023
Object Recognition as Next Token Prediction
Object Recognition as Next Token Prediction
Kaiyu Yue
Borchun Chen
Jonas Geiping
Hengduo Li
Tom Goldstein
Ser-Nam Lim
40
9
0
04 Dec 2023
TimeChat: A Time-sensitive Multimodal Large Language Model for Long
  Video Understanding
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Shuhuai Ren
Linli Yao
Shicheng Li
Xu Sun
Lu Hou
VLM
MLLM
25
174
0
04 Dec 2023
Towards Learning a Generalist Model for Embodied Navigation
Towards Learning a Generalist Model for Embodied Navigation
Duo Zheng
Shijia Huang
Lin Zhao
Yiwu Zhong
Liwei Wang
LM&Ro
41
41
0
04 Dec 2023
Bootstrapping SparseFormers from Vision Foundation Models
Bootstrapping SparseFormers from Vision Foundation Models
Ziteng Gao
Zhan Tong
K. Lin
Joya Chen
Mike Zheng Shou
38
0
0
04 Dec 2023
InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language
  Models
InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models
Xunguang Wang
Zhenlan Ji
Pingchuan Ma
Zongjie Li
Shuai Wang
MLLM
43
11
0
04 Dec 2023
Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large
  Language Models
Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models
Bingshuai Liu
Chenyang Lyu
Zijun Min
Zhanyu Wang
Jinsong Su
Longyue Wang
LRM
36
7
0
04 Dec 2023
Hulk: A Universal Knowledge Translator for Human-Centric Tasks
Hulk: A Universal Knowledge Translator for Human-Centric Tasks
Yizhou Wang
YiXuan Wu
Shixiang Tang
Weizhen He
Xun Guo
...
Lei Bai
Rui Zhao
Jian Wu
Tong He
Wanli Ouyang
VLM
44
14
0
04 Dec 2023
CLAMP: Contrastive LAnguage Model Prompt-tuning
CLAMP: Contrastive LAnguage Model Prompt-tuning
Piotr Teterwak
Ximeng Sun
Bryan A. Plummer
Kate Saenko
Ser-Nam Lim
MLLM
VLM
40
1
0
04 Dec 2023
Good Questions Help Zero-Shot Image Reasoning
Good Questions Help Zero-Shot Image Reasoning
Kaiwen Yang
Tao Shen
Xinmei Tian
Xiubo Geng
Chongyang Tao
Dacheng Tao
Dinesh Manocha
LRM
34
7
0
04 Dec 2023
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
Feng Wang
Jieru Mei
Alan L. Yuille
VLM
32
55
0
04 Dec 2023
How to Configure Good In-Context Sequence for Visual Question Answering
How to Configure Good In-Context Sequence for Visual Question Answering
Li Li
Jiawei Peng
Huiyi Chen
Chongyang Gao
Xu Yang
MLLM
17
20
0
04 Dec 2023
StoryGPT-V: Large Language Models as Consistent Story Visualizers
StoryGPT-V: Large Language Models as Consistent Story Visualizers
Xiaoqian Shen
Mohamed Elhoseiny
VLM
101
10
0
04 Dec 2023
Effectively Fine-tune to Improve Large Multimodal Models for Radiology
  Report Generation
Effectively Fine-tune to Improve Large Multimodal Models for Radiology Report Generation
Yuzhe Lu
Sungmin Hong
Yash Shah
Panpan Xu
LM&MA
MedIm
32
7
0
03 Dec 2023
Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large
  Image-Language Models
Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models
Andrés Villa
Juan Carlos León Alcázar
Alvaro Soto
Bernard Ghanem
MLLM
VLM
29
9
0
03 Dec 2023
Language-driven All-in-one Adverse Weather Removal
Language-driven All-in-one Adverse Weather Removal
Hao Yang
Liyuan Pan
Yan Yang
Wei Liang
VLM
KELM
29
18
0
03 Dec 2023
Zero-Shot Video Question Answering with Procedural Programs
Zero-Shot Video Question Answering with Procedural Programs
Rohan Choudhury
Koichiro Niinuma
Kris M. Kitani
László A. Jeni
21
21
0
01 Dec 2023
Segment and Caption Anything
Segment and Caption Anything
Xiaoke Huang
Jianfeng Wang
Yansong Tang
Zheng Zhang
Han Hu
Jiwen Lu
Lijuan Wang
Zicheng Liu
MLLM
VLM
31
18
0
01 Dec 2023
Sequential Modeling Enables Scalable Learning for Large Vision Models
Sequential Modeling Enables Scalable Learning for Large Vision Models
Yutong Bai
Xinyang Geng
K. Mangalam
Amir Bar
Alan Yuille
Trevor Darrell
Jitendra Malik
Alexei A. Efros
MLLM
VLM
22
156
0
01 Dec 2023
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual
  Prompts
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Mu Cai
Haotian Liu
Dennis Park
Siva Karthik Mustikovela
Gregory P. Meyer
Yuning Chai
Yong Jae Lee
VLM
LRM
MLLM
46
85
0
01 Dec 2023
VideoBooth: Diffusion-based Video Generation with Image Prompts
VideoBooth: Diffusion-based Video Generation with Image Prompts
Yuming Jiang
Tianxing Wu
Shuai Yang
Chenyang Si
Dahua Lin
Yu Qiao
Chen Change Loy
Ziwei Liu
DiffM
VGen
40
65
0
01 Dec 2023
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from
  Fine-grained Correctional Human Feedback
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
M. Steyvers
Yuan Yao
Haoye Zhang
Taiwen He
Yifeng Han
...
Xinyue Hu
Zhiyuan Liu
Hai-Tao Zheng
Maosong Sun
Tat-Seng Chua
MLLM
VLM
144
177
0
01 Dec 2023
Dolphins: Multimodal Language Model for Driving
Dolphins: Multimodal Language Model for Driving
Yingzi Ma
Yulong Cao
Jiachen Sun
Marco Pavone
Chaowei Xiao
MLLM
38
50
0
01 Dec 2023
ChatPose: Chatting about 3D Human Pose
ChatPose: Chatting about 3D Human Pose
Yao Feng
Jing Lin
Sai Kumar Dwivedi
Yu Sun
Priyanka Patel
Michael J. Black
3DH
26
38
0
30 Nov 2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware
  representations to LLMs and Emergent Cross-modal Reasoning
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning
Artemis Panagopoulou
Le Xue
Ning Yu
Junnan Li
Dongxu Li
Chenyu You
Ran Xu
Silvio Savarese
Caiming Xiong
Juan Carlos Niebles
VLM
MLLM
41
45
0
30 Nov 2023
Previous
123...555657...646566
Next