Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.10904
Cited By
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
24 August 2021
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SimVLM: Simple Visual Language Model Pretraining with Weak Supervision"
50 / 565 papers shown
Title
ULFine: Unbiased Lightweight Fine-tuning for Foundation-Model-Assisted Long-Tailed Semi-Supervised Learning
Enhao Zhang
Chaohua Li
Chuanxing Geng
Songcan Chen
47
0
0
08 May 2025
A Large Vision-Language Model based Environment Perception System for Visually Impaired People
Zezhou Chen
Zhaoxiang Liu
Kai Wang
Kohou Wang
Shiguo Lian
44
0
0
25 Apr 2025
PATFinger: Prompt-Adapted Transferable Fingerprinting against Unauthorized Multimodal Dataset Usage
W. Zhang
Ju Jia
Xiaojun Jia
Yihao Huang
X. Li
Cong Wu
Lina Wang
AAML
28
0
0
15 Apr 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei
Jiacong Wang
Haochen Wang
X. Li
Jun Hao Liew
Jiashi Feng
Zilong Huang
23
1
0
14 Apr 2025
Enhancing Image Resolution of Solar Magnetograms: A Latent Diffusion Model Approach
Francesco P. Ramunno
Paolo Massa
Vitaliy Kinakh
Brandon Panos
A. Csillaghy
S. Voloshynovskiy
DiffM
43
0
0
31 Mar 2025
Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval
Haoqiang Lin
Haokun Wen
Xuemeng Song
Meng Liu
Yupeng Hu
Liqiang Nie
44
13
0
25 Mar 2025
Improved Alignment of Modalities in Large Vision Language Models
Kartik Jangra
Aman Kumar Singh
Yashwani Mann
Geetanjali Rathee
VLM
50
0
0
25 Mar 2025
Audio-Enhanced Vision-Language Modeling with Latent Space Broadening for High Quality Data Expansion
Yu Sun
Yin Li
R.-H. Sun
Chunhui Liu
Fangming Zhou
Ze Jin
Linjie Wang
Xiang Shen
Zhuolin Hao
Hongyu Xiong
VLM
40
0
0
21 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
60
0
0
13 Mar 2025
Hoi2Anomaly: An Explainable Anomaly Detection Approach Guided by Human-Object Interaction
Yuhan Wang
Cheng Liu
Daou Zhang
Weichao Wu
39
0
0
13 Mar 2025
Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness
B. Zhu
Jiequan Cui
H. Zhang
Chi Zhang
65
0
0
12 Mar 2025
A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery
Yiheng Zhu
Mingyang Li
Junlong Liu
Kun Fu
J. Wu
Q. Li
Mingze Yin
Jieping Ye
Jian Wu
Z. Wang
55
0
0
06 Mar 2025
Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA
Z. Zhong
Yuli Wang
Lulu Bi
Zhuoqi Ma
S. H. Ahn
...
Webster Stayman
Todd M. Kolb
I. Kamel
Harrison X. Bai
Zhicheng Jiao
LM&MA
51
0
0
03 Mar 2025
Pretrained Image-Text Models are Secretly Video Captioners
Chunhui Zhang
Yiren Jian
Z. Ouyang
Soroush Vosoughi
VLM
60
3
0
20 Feb 2025
HCMRM: A High-Consistency Multimodal Relevance Model for Search Ads
Guobing Gan
Kaiming Gao
Li Wang
Shen Jiang
Peng Jiang
56
0
0
09 Feb 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
99
1
0
28 Jan 2025
Dr. Tongue: Sign-Oriented Multi-label Detection for Remote Tongue Diagnosis
Yiliang Chen
Steven SC Ho
Cheng Xu
Yao Jie Xie
Wing-Fai Yeung
Shengfeng He
Jing Qin
LM&MA
28
0
0
06 Jan 2025
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Ziyan Jiang
Rui Meng
Xinyi Yang
Semih Yavuz
Yingbo Zhou
Wenhu Chen
MLLM
VLM
45
18
0
03 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
43
0
0
03 Jan 2025
Improving Generated and Retrieved Knowledge Combination Through Zero-shot Generation
Xinkai Du
Quanjie Han
Chao Lv
Y. Liu
Yalin Sun
Hao Shu
Hongbo Shan
Maosong Sun
RALM
30
1
0
25 Dec 2024
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
Chu Myaet Thwal
Ye Lin Tun
Minh N. H. Nguyen
Eui-nam Huh
Choong Seon Hong
VLM
72
0
0
05 Dec 2024
Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation
Jungeun Kim
Hyeongwoo Jeon
Jongseong Bae
Ha Young Kim
SLR
72
0
0
25 Nov 2024
Classification Done Right for Vision-Language Pre-Training
Zilong Huang
Qinghao Ye
Bingyi Kang
Jiashi Feng
Haoqi Fan
CLIP
VLM
33
0
0
05 Nov 2024
Offline Evaluation of Set-Based Text-to-Image Generation
Negar Arabzadeh
Fernando Diaz
Junfeng He
EGVM
24
0
0
22 Oct 2024
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training
Muhe Ding
Yang Ma
Pengda Qin
Jianlong Wu
Yuhong Li
Liqiang Nie
16
0
0
18 Oct 2024
Declarative Knowledge Distillation from Large Language Models for Visual Question Answering Datasets
Thomas Eiter
Jan Hadl
N. Higuera
J. Oetsch
11
0
0
12 Oct 2024
Multi-granularity Contrastive Cross-modal Collaborative Generation for End-to-End Long-term Video Question Answering
Ting Yu
Kunhao Fu
Jian Zhang
Qingming Huang
Jun Yu
25
2
0
12 Oct 2024
Deep Correlated Prompting for Visual Recognition with Missing Modalities
Lianyu Hu
Tongkai Shi
Wei Feng
Fanhua Shang
Liang Wan
VLM
17
0
0
09 Oct 2024
Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding
Wei Yu Wu
Chao Wang
Liyi Chen
Mingze Yin
Yiheng Zhu
Kun Fu
Jieping Ye
Hui Xiong
Zheng Wang
28
1
0
04 Oct 2024
Generalizable Prompt Tuning for Vision-Language Models
Qian Zhang
VLM
VPVLM
43
0
0
04 Oct 2024
CPFD: Confidence-aware Privileged Feature Distillation for Short Video Classification
Jinghao Shi
Xiang Shen
Kaili Zhao
Xuedong Wang
Vera Wen
Zixuan Wang
Yifan Wu
Zhixin Zhang
21
0
0
03 Oct 2024
SynCo: Synthetic Hard Negatives in Contrastive Learning for Better Unsupervised Visual Representations
Nikolaos Giakoumoglou
Tania Stathaki
SSL
33
2
0
03 Oct 2024
Saliency-Guided DETR for Moment Retrieval and Highlight Detection
Aleksandr Gordeev
Vladimir Dokholyan
Irina Tolstykh
Maksim Kuprashevich
18
4
0
02 Oct 2024
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Hanqi Jiang
Xixuan Hao
Yuzhou Huang
Chong Ma
Jiaxun Zhang
Yi Pan
Ruimao Zhang
MedIm
26
0
0
01 Oct 2024
Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models
Anil Osman Tur
Alessandro Conti
Cigdem Beyan
Davide Boscaini
Roberto Larcher
S. Messelodi
Fabio Poiesi
Elisa Ricci
VLM
24
0
0
23 Sep 2024
Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model
Li Zhou
Xu Yuan
Zenghui Sun
Zikun Zhou
Jingsong Lan
VLM
MLLM
36
2
0
20 Sep 2024
LARE: Latent Augmentation using Regional Embedding with Vision-Language Model
Kosuke Sakurai
Tatsuya Ishii
Ryotaro Shimizu
Linxin Song
Masayuki Goto
VLM
16
0
0
19 Sep 2024
NVLM: Open Frontier-Class Multimodal LLMs
Wenliang Dai
Nayeon Lee
Boxin Wang
Zhuoling Yang
Zihan Liu
Jon Barker
Tuomas Rintamaki
M. Shoeybi
Bryan Catanzaro
Wei Ping
MLLM
VLM
LRM
29
50
0
17 Sep 2024
Surveying the MLLM Landscape: A Meta-Review of Current Surveys
Ming Li
Keyu Chen
Ziqian Bi
Ming Liu
Benji Peng
...
Jinlang Wang
Sen Zhang
X. Pan
Jiawei Xu
Pohsun Feng
OffRL
34
2
0
17 Sep 2024
KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph
Yanbei Jiang
Krista A. Ehinger
Jey Han Lau
SLR
31
0
0
17 Sep 2024
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training
Yiyi Tao
Zhuoyue Wang
Hang Zhang
Lun Wang
VLM
32
3
0
15 Sep 2024
TriplePlay: Enhancing Federated Learning with CLIP for Non-IID Data and Resource Efficiency
Ahmed Imteaj
Md Zarif Hossain
Saika Zaman
Abdur R. Shahid
VLM
16
1
0
09 Sep 2024
D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal models
Matteo Forlini
Mihail Babcinschi
Giacomo Palmieri
Pedro Neto
24
1
0
21 Aug 2024
Instruction Tuning-free Visual Token Complement for Multimodal LLMs
Dongsheng Wang
Jiequan Cui
Miaoge Li
Wang Lin
Bo Chen
Hanwang Zhang
MLLM
21
3
0
09 Aug 2024
The Data Addition Dilemma
Judy Hanwen Shen
Inioluwa Deborah Raji
Irene Y. Chen
22
5
0
08 Aug 2024
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
William Y. Zhu
Keren Ye
Junjie Ke
Jiahui Yu
Leonidas J. Guibas
P. Milanfar
Feng Yang
35
0
0
07 Aug 2024
Unsupervised Domain Adaption Harnessing Vision-Language Pre-training
Wenlve Zhou
Zhiheng Zhou
VLM
26
6
0
05 Aug 2024
ActivityCLIP: Enhancing Group Activity Recognition by Mining Complementary Information from Text to Supplement Image Modality
Guoliang Xu
Jianqin Yin
Feng Zhou
Yonghao Dang
VLM
28
0
0
29 Jul 2024
Learning Visual Grounding from Generative Vision and Language Model
Shijie Wang
Dahun Kim
A. Taalimi
Chen Sun
Weicheng Kuo
ObjD
29
5
0
18 Jul 2024
NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning
Yi Zhang
Chun-Wun Cheng
Ke Yu
Zhihai He
Carola-Bibiane Schonlieb
Angelica I Aviles-Rivero
VLM
26
2
0
11 Jul 2024
1
2
3
4
...
10
11
12
Next