Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.00685
Cited By
Vision-Language Models for Vision Tasks: A Survey
3 April 2023
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Vision-Language Models for Vision Tasks: A Survey"
50 / 73 papers shown
Title
Learning Knowledge-based Prompts for Robust 3D Mask Presentation Attack Detection
Fangling Jiang
Qi Li
Bing Liu
Weining Wang
Caifeng Shan
Zhenan Sun
Ming-Hsuan Yang
44
0
0
06 May 2025
Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images
Fangling Jiang
Qi Li
Weining Wang
Wei Shen
Bing Liu
Zhenan Sun
AAML
26
0
0
06 May 2025
DriveAgent: Multi-Agent Structured Reasoning with LLM and Multimodal Sensor Fusion for Autonomous Driving
Xinmeng Hou
Wuqi Wang
Long Yang
Hao Lin
Jinglun Feng
Haigen Min
Xiangmo Zhao
24
0
0
04 May 2025
Handling Imbalanced Pseudolabels for Vision-Language Models with Concept Alignment and Confusion-Aware Calibrated Margin
Yuchen Wang
X. Bai
X. Li
Weili Guan
Liqiang Nie
Xinyang Chen
VLM
24
0
0
04 May 2025
Contrastive Language-Image Learning with Augmented Textual Prompts for 3D/4D FER Using Vision-Language Model
Muzammil Behzad
Guoying Zhao
VLM
51
0
0
28 Apr 2025
PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks
J. Wu
Hao Yang
Xinhua Zeng
Guibing He
Z. Chen
Z. Li
X. Zhang
Yangyang Ma
Run Fang
Yang Liu
LRM
26
0
0
12 Apr 2025
COP-GEN-Beta: Unified Generative Modelling of COPernicus Imagery Thumbnails
Miguel Espinosa
V. Marsocci
Yuru Jia
Elliot J. Crowley
Mikolaj Czerkawski
DiffM
47
0
0
11 Apr 2025
Embedding Shift Dissection on CLIP: Effects of Augmentations on VLM's Representation Learning
Ashim Dahal
Saydul Akbar Murad
Nick Rahimi
VLM
33
0
0
30 Mar 2025
Generalizable and Explainable Deep Learning for Medical Image Computing: An Overview
A. Chaddad
Yan Hu
Yihang Wu
Binbin Wen
R. Kateb
47
6
0
11 Mar 2025
Enhancing Collective Intelligence in Large Language Models Through Emotional Integration
Likith Kadiyala
Ramteja Sajja
Y. Sermet
Ibrahim Demir
43
0
0
05 Mar 2025
Vision-Language Models for Edge Networks: A Comprehensive Survey
Ahmed Sharshar
Latif U. Khan
Waseem Ullah
Mohsen Guizani
VLM
51
2
0
11 Feb 2025
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Jinyang Wu
Mingkuan Feng
Shuai Zhang
Ruihan Jin
Feihu Che
Zengqi Wen
J. Tao
LRM
47
7
0
04 Feb 2025
Human Re-ID Meets LVLMs: What can we expect?
Kailash A. Hambarde
Pranita Samale
Hugo Proença
56
0
0
30 Jan 2025
ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality
Yanming Xiu
T. Scargill
M. Gorlatova
65
2
0
22 Jan 2025
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Lu Qiu
Yuying Ge
Yi Chen
Yixiao Ge
Ying Shan
Xihui Liu
LLMAG
LRM
78
5
0
05 Dec 2024
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
Chu Myaet Thwal
Ye Lin Tun
Minh N. H. Nguyen
Eui-nam Huh
Choong Seon Hong
VLM
67
0
0
05 Dec 2024
Expanding Event Modality Applications through a Robust CLIP-Based Encoder
SungHeon Jeong
Hanning Chen
Sanggeon Yun
Suhyeon Cho
Wenjun Huang
Xiangjian Liu
Mohsen Imani
87
1
0
04 Dec 2024
Libra: Leveraging Temporal Images for Biomedical Radiology Analysis
Xi Zhang
Zaiqiao Meng
Jake Lever
Edmond S. L. Ho
MedIm
94
0
0
28 Nov 2024
Multiple Information Prompt Learning for Cloth-Changing Person Re-Identification
Shengxun Wei
Zan Gao
Yibo Zhao
Weili Guan
Weili Guan
Shengyong Chen
36
1
0
01 Nov 2024
GraphCLIP: Enhancing Transferability in Graph Foundation Models for Text-Attributed Graphs
Yun Zhu
Haizhou Shi
Xiaotang Wang
Yongchao Liu
Yaoke Wang
Boci Peng
Chuntao Hong
Siliang Tang
VLM
39
6
0
14 Oct 2024
Thought2Text: Text Generation from EEG Signal using Large Language Models (LLMs)
Abhijit Mishra
Shreya Shukla
Jose Torres
Jacek Gwizdka
Shounak Roychowdhury
25
4
0
10 Oct 2024
Enhancing Screen Time Identification in Children with a Multi-View Vision Language Model and Screen Time Tracker
Xinlong Hou
Sen Shen
Xueshen Li
Xinran Gao
Ziyi Huang
Steven J. Holiday
Matthew R. Cribbet
Susan W. White
Edward Sazonov
Yu Gan
24
0
0
02 Oct 2024
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
Zhecan Wang
Junzhang Liu
Chia-Wei Tang
Hani Alomari
Anushka Sivakumar
...
Haoxuan You
A. Ishmam
Kai-Wei Chang
Shih-Fu Chang
Chris Thomas
CoGe
VLM
31
2
0
19 Sep 2024
Bootstrapping Object-level Planning with Large Language Models
D. Paulius
Alejandro Agostini
Benedict Quartey
G. Konidaris
LM&Ro
29
1
0
18 Sep 2024
MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection
Yaning Zhang
Tianyi Wang
Zitong Yu
Zan Gao
Linlin Shen
Shengyong Chen
DiffM
65
3
0
15 Sep 2024
QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems
Zhixian He
Pengcheng Zhao
Fuwei Zhang
Shujin Lin
21
0
0
14 Sep 2024
An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation
Peiming Guo
Sinuo Liu
Yanzhao Zhang
Dingkun Long
Pengjun Xie
Meishan Zhang
M. Zhang
DiffM
35
1
0
16 Aug 2024
GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM
Keshav Bimbraw
Ye Wang
Jing Liu
T. Koike-Akino
VLM
MedIm
LM&MA
19
1
0
15 Jul 2024
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models
Jinliang Lu
Ziliang Pang
Min Xiao
Yaochen Zhu
Rui Xia
Jiajun Zhang
MoMe
14
17
0
08 Jul 2024
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
Jia Syuen Lim
Zhuoxiao Chen
Mahsa Baktashmotlagh
Zhi Chen
Xin Yu
Zi Huang
Yadan Luo
VLM
ObjD
47
1
0
21 Jun 2024
MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning
Shuo Xu
Sai Wang
Xinyue Hu
Yutian Lin
Bo Du
Yu Wu
CoGe
28
0
0
18 Jun 2024
Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning
Shuvendu Roy
Yasaman Parhizkar
Franklin Ogidi
Vahid Reza Khazaie
Michael Colacci
Ali Etemad
Elham Dolatabadi
Arash Afkanpour
VLM
27
1
0
11 Jun 2024
Language-guided Detection and Mitigation of Unknown Dataset Bias
Zaiying Zhao
Soichiro Kumano
Toshihiko Yamasaki
18
2
0
05 Jun 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
49
38
0
23 May 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Yunxin Li
Shenyuan Jiang
Baotian Hu
Longyue Wang
Wanqi Zhong
Wenhan Luo
Lin Ma
Min-Ling Zhang
MoE
12
27
0
18 May 2024
Contextual Emotion Recognition using Large Vision Language Models
Yasaman Etesam
Özge Nilay Yalçin
Chuxuan Zhang
Angelica Lim
VLM
49
3
0
14 May 2024
Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI
Gyeong-Geon Lee
Xiaoming Zhai
14
4
0
12 May 2024
Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark
Evan M. Williams
Kathleen M. Carley
CoGe
31
0
0
10 May 2024
On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?
Maxime Zanella
Ismail Ben Ayed
VLM
MLLM
22
22
0
03 May 2024
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Letitia Parcalabescu
Anette Frank
MLLM
CoGe
VLM
76
3
0
29 Apr 2024
Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs
Yu Xia
Rui Wang
Xu Liu
Mingyan Li
Tong Yu
Xiang Chen
Julian McAuley
Shuai Li
LRM
33
16
0
24 Apr 2024
ZeroCAP: Zero-Shot Multi-Robot Context Aware Pattern Formation via Large Language Models
Vishnunandan L. N. Venkatesh
Byung-Cheol Min
LM&Ro
51
1
0
02 Apr 2024
Heterogeneous Contrastive Learning for Foundation Models and Beyond
Lecheng Zheng
Baoyu Jing
Zihao Li
Hanghang Tong
Jingrui He
VLM
16
18
0
30 Mar 2024
Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making
Shuai Ma
Qiaoyi Chen
Xinru Wang
Chengbo Zheng
Zhenhui Peng
Ming Yin
Xiaojuan Ma
ELM
15
17
0
25 Mar 2024
To Help or Not to Help: LLM-based Attentive Support for Human-Robot Group Interactions
Daniel Tanneberg
Felix Ocker
Stephan Hasler
Joerg Deigmoeller
Anna Belardinelli
Chao Wang
H. Wersing
Bernhard Sendhoff
Michael Gienger
LM&Ro
45
12
0
19 Mar 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Akila Wickramasekara
F. Breitinger
Mark Scanlon
37
7
0
29 Feb 2024
CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare
Akash Ghosh
Arkadeep Acharya
Raghav Jain
Sriparna Saha
Aman Chadha
Setu Sinha
12
28
0
16 Dec 2023
Few-shot medical image classification with simple shape and texture text descriptors using vision-language models
Michal Byra
M. F. Rachmadi
Henrik Skibbe
VLM
20
6
0
08 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
13
116
0
25 Jul 2023
A Survey of Label-Efficient Deep Learning for 3D Point Clouds
Aoran Xiao
Xiaoqin Zhang
Ling Shao
Shijian Lu
3DPC
15
18
0
31 May 2023
1
2
Next