ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.10936
  4. Cited By
A Survey of Vision-Language Pre-Trained Models

A Survey of Vision-Language Pre-Trained Models

18 February 2022
Yifan Du
Zikang Liu
Junyi Li
Wayne Xin Zhao
    VLM
ArXivPDFHTML

Papers citing "A Survey of Vision-Language Pre-Trained Models"

50 / 124 papers shown
Title
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
Y. Chen
Hao Peng
Tong Zhang
Heng Ji
VLM
9
0
0
13 May 2025
Vision and Language Integration for Domain Generalization
Vision and Language Integration for Domain Generalization
Yanmei Wang
Xiyao Liu
Fupeng Chu
Zhi-Long Han
VLM
39
0
0
17 Apr 2025
TactileNet: Bridging the Accessibility Gap with AI-Generated Tactile Graphics for Individuals with Vision Impairment
TactileNet: Bridging the Accessibility Gap with AI-Generated Tactile Graphics for Individuals with Vision Impairment
Adnan Khan
Alireza Choubineh
Mai A. Shaaban
Abbas Akkasi
Majid Komeili
DiffM
30
0
0
07 Apr 2025
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Xiaofeng Han
Shunpeng Chen
Zenghuang Fu
Zhe Feng
Lue Fan
...
Li Guo
Weiliang Meng
Xiaopeng Zhang
Rongtao Xu
Shibiao Xu
60
0
0
03 Apr 2025
PARIC: Probabilistic Attention Regularization for Language Guided Image Classification from Pre-trained Vison Language Models
Mayank Nautiyal
Stela Arranz Gheorghe
Kristiana Stefa
Li Ju
Ida-Maria Sintorn
Prashant Singh
VLM
54
0
0
14 Mar 2025
Bidirectional Prototype-Reward co-Evolution for Test-Time Adaptation of Vision-Language Models
Xiaozhen Qiao
Peng Huang
Jiakang Yuan
Xianda Guo
Bowen Ye
Zhe Sun
Xuelong Li
60
0
0
12 Mar 2025
Concept Corrector: Erase concepts on the fly for text-to-image diffusion models
Concept Corrector: Erase concepts on the fly for text-to-image diffusion models
Zheling Meng
Bo Peng
Xiaochuan Jin
Yueming Lyu
Wei Wang
Jing Dong
DiffM
38
2
0
22 Feb 2025
Learning Generalizable Prompt for CLIP with Class Similarity Knowledge
Learning Generalizable Prompt for CLIP with Class Similarity Knowledge
Sehun Jung
Hyang-won Lee
VLM
VPVLM
53
0
0
17 Feb 2025
Vision-Language Models for Edge Networks: A Comprehensive Survey
Vision-Language Models for Edge Networks: A Comprehensive Survey
Ahmed Sharshar
Latif U. Khan
Waseem Ullah
Mohsen Guizani
VLM
62
3
0
11 Feb 2025
Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models
Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models
Y. Ranasinghe
Vibashan Vs
James Uplinger
C. D. Melo
Vishal M. Patel
34
0
0
13 Jan 2025
CATALOG: A Camera Trap Language-guided Contrastive Learning Model
CATALOG: A Camera Trap Language-guided Contrastive Learning Model
Julian D. Santamaria
Claudia Isaza
Jhony H. Giraldo
71
0
0
14 Dec 2024
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
Chu Myaet Thwal
Ye Lin Tun
Minh N. H. Nguyen
Eui-nam Huh
Choong Seon Hong
VLM
74
0
0
05 Dec 2024
From Laws to Motivation: Guiding Exploration through Law-Based Reasoning
  and Rewards
From Laws to Motivation: Guiding Exploration through Law-Based Reasoning and Rewards
Ziyu Chen
Zhiqing Xiao
Xinbei Jiang
Junbo Zhao
69
0
0
24 Nov 2024
Vision Language Models Are Few-Shot Audio Spectrogram Classifiers
Vision Language Models Are Few-Shot Audio Spectrogram Classifiers
Satvik Dixit
Laurie M. Heller
Chris Donahue
VLM
62
5
0
18 Nov 2024
Dual Prototype Evolving for Test-Time Generalization of Vision-Language
  Models
Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models
Ce Zhang
Simon Stepputtis
Katia P. Sycara
Yaqi Xie
VLM
35
5
0
16 Oct 2024
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for
  Embodied AI
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI
Sijie Cheng
Kechen Fang
Yangyang Yu
Sicheng Zhou
B. Li
Ye Tian
Tingguang Li
Lei Han
Yang Janet Liu
37
8
0
15 Oct 2024
LADEV: A Language-Driven Testing and Evaluation Platform for
  Vision-Language-Action Models in Robotic Manipulation
LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation
Zhijie Wang
Zhehua Zhou
Jiayang Song
Yuheng Huang
Zhan Shu
Lei Ma
21
0
0
07 Oct 2024
The Wallpaper is Ugly: Indoor Localization using Vision and Language
The Wallpaper is Ugly: Indoor Localization using Vision and Language
Seth Pate
Lawson L. S. Wong
29
0
0
04 Oct 2024
ZALM3: Zero-Shot Enhancement of Vision-Language Alignment via In-Context
  Information in Multi-Turn Multimodal Medical Dialogue
ZALM3: Zero-Shot Enhancement of Vision-Language Alignment via In-Context Information in Multi-Turn Multimodal Medical Dialogue
Zhangpu Li
Changhong Zou
Suxue Ma
Zhicheng Yang
Chen Du
...
Xingzhi Sun
Jing Xiao
Kai Zhang
Mei Han
Mei Han
LM&MA
46
1
0
26 Sep 2024
An overview of domain-specific foundation model: key technologies,
  applications and challenges
An overview of domain-specific foundation model: key technologies, applications and challenges
Haolong Chen
Hanzhi Chen
Zijian Zhao
Kaifeng Han
Guangxu Zhu
Yichen Zhao
Ying Du
Wei Xu
Qingjiang Shi
ALM
VLM
61
4
0
06 Sep 2024
Probing the Robustness of Vision-Language Pretrained Models: A
  Multimodal Adversarial Attack Approach
Probing the Robustness of Vision-Language Pretrained Models: A Multimodal Adversarial Attack Approach
Jiwei Guan
Tianyu Ding
Longbing Cao
Lei Pan
Chen Wang
Xi Zheng
AAML
26
1
0
24 Aug 2024
ARPA: A Novel Hybrid Model for Advancing Visual Word Disambiguation
  Using Large Language Models and Transformers
ARPA: A Novel Hybrid Model for Advancing Visual Word Disambiguation Using Large Language Models and Transformers
Aristi Papastavrou
Maria Lymperaiou
Giorgos Stamou
AI4CE
24
1
0
12 Aug 2024
ActivityCLIP: Enhancing Group Activity Recognition by Mining
  Complementary Information from Text to Supplement Image Modality
ActivityCLIP: Enhancing Group Activity Recognition by Mining Complementary Information from Text to Supplement Image Modality
Guoliang Xu
Jianqin Yin
Feng Zhou
Yonghao Dang
VLM
36
0
0
29 Jul 2024
Data Processing Techniques for Modern Multimodal Models
Data Processing Techniques for Modern Multimodal Models
Yinheng Li
Han Ding
Hang Chen
VLM
27
0
0
27 Jul 2024
HEMM: Holistic Evaluation of Multimodal Foundation Models
HEMM: Holistic Evaluation of Multimodal Foundation Models
Paul Pu Liang
Akshay Goindani
Talha Chafekar
Leena Mathur
Haofei Yu
Ruslan Salakhutdinov
Louis-Philippe Morency
36
10
0
03 Jul 2024
ViG-Bias: Visually Grounded Bias Discovery and Mitigation
ViG-Bias: Visually Grounded Bias Discovery and Mitigation
Badr-Eddine Marani
Mohamed Hanini
Nihitha Malayarukil
Stergios Christodoulidis
Maria Vakalopoulou
Enzo Ferrante
14
0
0
02 Jul 2024
TabSketchFM: Sketch-based Tabular Representation Learning for Data
  Discovery over Data Lakes
TabSketchFM: Sketch-based Tabular Representation Learning for Data Discovery over Data Lakes
Aamod Khatiwada
Harsha Kokel
Ibrahim Abdelaziz
Subhajit Chaudhury
Julian T Dolby
Oktie Hassanzadeh
Zhenhan Huang
Tejaswini Pedapati
Horst Samulowitz
Kavitha Srinivas
LMTD
23
2
0
28 Jun 2024
Open-vocabulary Pick and Place via Patch-level Semantic Maps
Open-vocabulary Pick and Place via Patch-level Semantic Maps
Mingxi Jia
Haojie Huang
Zhewen Zhang
Chenghao Wang
Linfeng Zhao
Dian Wang
J. Liu
Robin Walters
Robert Platt
Stefanie Tellex
LM&Ro
37
5
0
21 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAML
VLM
30
13
0
08 Jun 2024
MiniGPT-Reverse-Designing: Predicting Image Adjustments Utilizing
  MiniGPT-4
MiniGPT-Reverse-Designing: Predicting Image Adjustments Utilizing MiniGPT-4
Vahid Azizi
Fatemeh Koochaki
VLM
43
0
0
03 Jun 2024
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
Laura Fieback
Jakob Spiegelberg
Hanno Gottschalk
MLLM
57
5
0
29 May 2024
EASI-Tex: Edge-Aware Mesh Texturing from Single Image
EASI-Tex: Edge-Aware Mesh Texturing from Single Image
Sai Raj Kishore Perla
Yizhi Wang
Ali Mahdavi-Amiri
Hao Zhang
DiffM
37
9
0
27 May 2024
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical
  Study of VCR
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Xiaolin Chen
Liqiang Nie
Mohan S. Kankanhalli
LRM
19
7
0
27 May 2024
Fine-grained Speech Sentiment Analysis in Chinese Psychological Support
  Hotlines Based on Large-scale Pre-trained Model
Fine-grained Speech Sentiment Analysis in Chinese Psychological Support Hotlines Based on Large-scale Pre-trained Model
Zhonglong Chen
Changwei Song
Yining Chen
Jianqiang Li
Guanghui Fu
Yongsheng Tong
Qing Zhao
AI4MH
32
0
0
07 May 2024
EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking
  Enhances Visual Commonsense Reasoning
EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning
Mingjie Ma
Zhihuan Yu
Yichao Ma
Guohui Li
LRM
33
1
0
22 Apr 2024
ECOR: Explainable CLIP for Object Recognition
ECOR: Explainable CLIP for Object Recognition
Ali Rasekh
Sepehr Kazemi Ranjbar
Milad Heidari
Wolfgang Nejdl
VLM
33
4
0
19 Apr 2024
Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V
Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V
Peiyuan Zhi
Zhiyuan Zhang
Muzhi Han
Zeyu Zhang
Zhitian Li
Ziyuan Jiao
Ziyuan Jiao
Siyuan Huang
Siyuan Huang
LRM
LM&Ro
38
28
0
16 Apr 2024
RankCLIP: Ranking-Consistent Language-Image Pretraining
RankCLIP: Ranking-Consistent Language-Image Pretraining
Yiming Zhang
Zhuokai Zhao
Zhaorun Chen
Zhili Feng
Zenghui Ding
Yining Sun
SSL
VLM
43
7
0
15 Apr 2024
GHOST: Grounded Human Motion Generation with Open Vocabulary
  Scene-and-Text Contexts
GHOST: Grounded Human Motion Generation with Open Vocabulary Scene-and-Text Contexts
Z. '. Milacski
Koichiro Niinuma
Ryosuke Kawamura
Fernando de la Torre
László A. Jeni
27
1
0
08 Apr 2024
Multi-Frame, Lightweight & Efficient Vision-Language Models for Question
  Answering in Autonomous Driving
Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving
Akshay Gopalkrishnan
Ross Greer
Mohan M. Trivedi
VLM
41
21
0
28 Mar 2024
ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation
  for Event-based Action Recognition and More
ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More
Jiazhou Zhou
Xueye Zheng
Yuanhuiyi Lyu
Lin Wang
79
18
0
19 Mar 2024
GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
Enguang Wang
Zhimao Peng
Zhengyuan Xie
Fei Yang
Xialei Liu
Ming-Ming Cheng
54
3
0
15 Mar 2024
SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph
  Attention
SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph Attention
Feng Xiao
Hongbin Xu
Qiuxia Wu
Wenxiong Kang
22
2
0
13 Mar 2024
Bridging Text and Molecule: A Survey on Multimodal Frameworks for
  Molecule
Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecule
Yi Xiao
Xiangxin Zhou
Qiang Liu
Liang Wang
AI4CE
30
3
0
07 Mar 2024
DomainVerse: A Benchmark Towards Real-World Distribution Shifts For
  Tuning-Free Adaptive Domain Generalization
DomainVerse: A Benchmark Towards Real-World Distribution Shifts For Tuning-Free Adaptive Domain Generalization
Feng Hou
Jin Yuan
Ying Yang
Yang Liu
Yang Zhang
Cheng Zhong
Zhongchao Shi
Jianping Fan
Yong Rui
Zhiqiang He
VLM
39
1
0
05 Mar 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLM
CLIP
62
12
0
05 Mar 2024
Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary
  Action Recognition
Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition
Kun-Yu Lin
Henghui Ding
Jiaming Zhou
Yu-Ming Tang
Yi-Xing Peng
Zhilin Zhao
Chen Change Loy
Wei-Shi Zheng
VLM
30
15
0
03 Mar 2024
Referee Can Play: An Alternative Approach to Conditional Generation via
  Model Inversion
Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion
Xuantong Liu
Tianyang Hu
Wenjia Wang
Kenji Kawaguchi
Yuan Yao
DiffM
47
3
0
26 Feb 2024
A Comprehensive Review of Machine Learning Advances on Data Change: A
  Cross-Field Perspective
A Comprehensive Review of Machine Learning Advances on Data Change: A Cross-Field Perspective
Jeng-Lin Li
Chih-Fan Hsu
Ming-Ching Chang
Wei-Chao Chen
OOD
36
2
0
20 Feb 2024
Vision Superalignment: Weak-to-Strong Generalization for Vision
  Foundation Models
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Jianyuan Guo
Hanting Chen
Chengcheng Wang
Kai Han
Chang Xu
Yunhe Wang
VLM
13
16
0
06 Feb 2024
123
Next