Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.07214
Cited By
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
20 February 2024
Akash Ghosh
Arkadeep Acharya
Sriparna Saha
Vinija Jain
Aman Chadha
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions"
16 / 16 papers shown
Title
Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI
Benjamin Raphael Ernhofer
Daniil Prokhorov
Jannica Langner
Dominik Bollmann
23
0
0
09 May 2025
Vision-Language Models for Edge Networks: A Comprehensive Survey
Ahmed Sharshar
Latif U. Khan
Waseem Ullah
Mohsen Guizani
VLM
60
2
0
11 Feb 2025
Do Language Models Understand Time?
Xi Ding
Lei Wang
158
0
0
18 Dec 2024
Do large language vision models understand 3D shapes?
Sagi Eppel
3DV
81
1
0
14 Dec 2024
LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
Yichen Zhu
Minjie Zhu
Ning Liu
Zhicai Ou
Xiaofeng Mou
Jian Tang
63
89
0
04 Jan 2024
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
Zineng Tang
Ziyi Yang
Mahmoud Khademi
Yang Liu
Chenguang Zhu
Mohit Bansal
LRM
MLLM
AuLLM
49
44
0
30 Nov 2023
Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges
Nianwen Si
Hao Zhang
Heyu Chang
Wenlin Zhang
Dan Qu
Weiqiang Zhang
KELM
MU
70
26
0
27 Nov 2023
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
154
280
0
14 Oct 2023
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
203
883
0
27 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
198
1,089
0
20 Sep 2022
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models
Jaemin Cho
Abhaysinh Zala
Mohit Bansal
ViT
132
167
0
08 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
382
4,010
0
28 Jan 2022
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
245
554
0
28 Sep 2021
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Yumao Lu
Zicheng Liu
Lijuan Wang
164
401
0
10 Sep 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
1