ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.10904
  4. Cited By
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

24 August 2021
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
    VLM
    MLLM
ArXivPDFHTML

Papers citing "SimVLM: Simple Visual Language Model Pretraining with Weak Supervision"

50 / 565 papers shown
Title
ULFine: Unbiased Lightweight Fine-tuning for Foundation-Model-Assisted Long-Tailed Semi-Supervised Learning
ULFine: Unbiased Lightweight Fine-tuning for Foundation-Model-Assisted Long-Tailed Semi-Supervised Learning
Enhao Zhang
Chaohua Li
Chuanxing Geng
Songcan Chen
47
0
0
08 May 2025
A Large Vision-Language Model based Environment Perception System for Visually Impaired People
A Large Vision-Language Model based Environment Perception System for Visually Impaired People
Zezhou Chen
Zhaoxiang Liu
Kai Wang
Kohou Wang
Shiguo Lian
44
0
0
25 Apr 2025
PATFinger: Prompt-Adapted Transferable Fingerprinting against Unauthorized Multimodal Dataset Usage
PATFinger: Prompt-Adapted Transferable Fingerprinting against Unauthorized Multimodal Dataset Usage
W. Zhang
Ju Jia
Xiaojun Jia
Yihao Huang
X. Li
Cong Wu
Lina Wang
AAML
28
0
0
15 Apr 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei
Jiacong Wang
Haochen Wang
X. Li
Jun Hao Liew
Jiashi Feng
Zilong Huang
26
1
0
14 Apr 2025
Enhancing Image Resolution of Solar Magnetograms: A Latent Diffusion Model Approach
Enhancing Image Resolution of Solar Magnetograms: A Latent Diffusion Model Approach
Francesco P. Ramunno
Paolo Massa
Vitaliy Kinakh
Brandon Panos
A. Csillaghy
S. Voloshynovskiy
DiffM
45
0
0
31 Mar 2025
Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval
Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval
Haoqiang Lin
Haokun Wen
Xuemeng Song
Meng Liu
Yupeng Hu
Liqiang Nie
44
13
0
25 Mar 2025
Improved Alignment of Modalities in Large Vision Language Models
Improved Alignment of Modalities in Large Vision Language Models
Kartik Jangra
Aman Kumar Singh
Yashwani Mann
Geetanjali Rathee
VLM
50
0
0
25 Mar 2025
Audio-Enhanced Vision-Language Modeling with Latent Space Broadening for High Quality Data Expansion
Audio-Enhanced Vision-Language Modeling with Latent Space Broadening for High Quality Data Expansion
Yu Sun
Yin Li
R.-H. Sun
Chunhui Liu
Fangming Zhou
Ze Jin
Linjie Wang
Xiang Shen
Zhuolin Hao
Hongyu Xiong
VLM
40
0
0
21 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
60
0
0
13 Mar 2025
Hoi2Anomaly: An Explainable Anomaly Detection Approach Guided by Human-Object Interaction
Hoi2Anomaly: An Explainable Anomaly Detection Approach Guided by Human-Object Interaction
Yuhan Wang
Cheng Liu
Daou Zhang
Weichao Wu
39
0
0
13 Mar 2025
Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness
Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness
B. Zhu
Jiequan Cui
H. Zhang
Chi Zhang
65
0
0
12 Mar 2025
A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery
Yiheng Zhu
Mingyang Li
Junlong Liu
Kun Fu
J. Wu
Q. Li
Mingze Yin
Jieping Ye
Jian Wu
Z. Wang
55
0
0
06 Mar 2025
Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA
Z. Zhong
Yuli Wang
Lulu Bi
Zhuoqi Ma
S. H. Ahn
...
Webster Stayman
Todd M. Kolb
I. Kamel
Harrison X. Bai
Zhicheng Jiao
LM&MA
51
0
0
03 Mar 2025
Pretrained Image-Text Models are Secretly Video Captioners
Pretrained Image-Text Models are Secretly Video Captioners
Chunhui Zhang
Yiren Jian
Z. Ouyang
Soroush Vosoughi
VLM
63
3
0
20 Feb 2025
HCMRM: A High-Consistency Multimodal Relevance Model for Search Ads
Guobing Gan
Kaiming Gao
Li Wang
Shen Jiang
Peng Jiang
59
0
0
09 Feb 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
99
1
0
28 Jan 2025
Dr. Tongue: Sign-Oriented Multi-label Detection for Remote Tongue Diagnosis
Dr. Tongue: Sign-Oriented Multi-label Detection for Remote Tongue Diagnosis
Yiliang Chen
Steven SC Ho
Cheng Xu
Yao Jie Xie
Wing-Fai Yeung
Shengfeng He
Jing Qin
LM&MA
28
0
0
06 Jan 2025
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Ziyan Jiang
Rui Meng
Xinyi Yang
Semih Yavuz
Yingbo Zhou
Wenhu Chen
MLLM
VLM
45
18
0
03 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
43
0
0
03 Jan 2025
Improving Generated and Retrieved Knowledge Combination Through
  Zero-shot Generation
Improving Generated and Retrieved Knowledge Combination Through Zero-shot Generation
Xinkai Du
Quanjie Han
Chao Lv
Y. Liu
Yalin Sun
Hao Shu
Hongbo Shan
Maosong Sun
RALM
30
1
0
25 Dec 2024
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
Chu Myaet Thwal
Ye Lin Tun
Minh N. H. Nguyen
Eui-nam Huh
Choong Seon Hong
VLM
72
0
0
05 Dec 2024
Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation
Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation
Jungeun Kim
Hyeongwoo Jeon
Jongseong Bae
Ha Young Kim
SLR
75
0
0
25 Nov 2024
Classification Done Right for Vision-Language Pre-Training
Classification Done Right for Vision-Language Pre-Training
Zilong Huang
Qinghao Ye
Bingyi Kang
Jiashi Feng
Haoqi Fan
CLIP
VLM
33
0
0
05 Nov 2024
Offline Evaluation of Set-Based Text-to-Image Generation
Offline Evaluation of Set-Based Text-to-Image Generation
Negar Arabzadeh
Fernando Diaz
Junfeng He
EGVM
24
0
0
22 Oct 2024
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping
  Language-Image Pre-training
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training
Muhe Ding
Yang Ma
Pengda Qin
Jianlong Wu
Yuhong Li
Liqiang Nie
16
0
0
18 Oct 2024
Declarative Knowledge Distillation from Large Language Models for Visual
  Question Answering Datasets
Declarative Knowledge Distillation from Large Language Models for Visual Question Answering Datasets
Thomas Eiter
Jan Hadl
N. Higuera
J. Oetsch
11
0
0
12 Oct 2024
Multi-granularity Contrastive Cross-modal Collaborative Generation for
  End-to-End Long-term Video Question Answering
Multi-granularity Contrastive Cross-modal Collaborative Generation for End-to-End Long-term Video Question Answering
Ting Yu
Kunhao Fu
Jian Zhang
Qingming Huang
Jun Yu
25
2
0
12 Oct 2024
Deep Correlated Prompting for Visual Recognition with Missing Modalities
Deep Correlated Prompting for Visual Recognition with Missing Modalities
Lianyu Hu
Tongkai Shi
Wei Feng
Fanhua Shang
Liang Wan
VLM
17
0
0
09 Oct 2024
Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose
  Protein Understanding
Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding
Wei Yu Wu
Chao Wang
Liyi Chen
Mingze Yin
Yiheng Zhu
Kun Fu
Jieping Ye
Hui Xiong
Zheng Wang
28
1
0
04 Oct 2024
Generalizable Prompt Tuning for Vision-Language Models
Generalizable Prompt Tuning for Vision-Language Models
Qian Zhang
VLM
VPVLM
43
0
0
04 Oct 2024
CPFD: Confidence-aware Privileged Feature Distillation for Short Video
  Classification
CPFD: Confidence-aware Privileged Feature Distillation for Short Video Classification
Jinghao Shi
Xiang Shen
Kaili Zhao
Xuedong Wang
Vera Wen
Zixuan Wang
Yifan Wu
Zhixin Zhang
21
0
0
03 Oct 2024
SynCo: Synthetic Hard Negatives in Contrastive Learning for Better
  Unsupervised Visual Representations
SynCo: Synthetic Hard Negatives in Contrastive Learning for Better Unsupervised Visual Representations
Nikolaos Giakoumoglou
Tania Stathaki
SSL
38
2
0
03 Oct 2024
Saliency-Guided DETR for Moment Retrieval and Highlight Detection
Saliency-Guided DETR for Moment Retrieval and Highlight Detection
Aleksandr Gordeev
Vladimir Dokholyan
Irina Tolstykh
Maksim Kuprashevich
21
4
0
02 Oct 2024
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Hanqi Jiang
Xixuan Hao
Yuzhou Huang
Chong Ma
Jiaxun Zhang
Yi Pan
Ruimao Zhang
MedIm
28
0
0
01 Oct 2024
Exploring Fine-grained Retail Product Discrimination with Zero-shot
  Object Classification Using Vision-Language Models
Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models
Anil Osman Tur
Alessandro Conti
Cigdem Beyan
Davide Boscaini
Roberto Larcher
S. Messelodi
Fabio Poiesi
Elisa Ricci
VLM
26
0
0
23 Sep 2024
Instruction-guided Multi-Granularity Segmentation and Captioning with
  Large Multimodal Model
Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model
Li Zhou
Xu Yuan
Zenghui Sun
Zikun Zhou
Jingsong Lan
VLM
MLLM
36
2
0
20 Sep 2024
LARE: Latent Augmentation using Regional Embedding with Vision-Language
  Model
LARE: Latent Augmentation using Regional Embedding with Vision-Language Model
Kosuke Sakurai
Tatsuya Ishii
Ryotaro Shimizu
Linxin Song
Masayuki Goto
VLM
19
0
0
19 Sep 2024
NVLM: Open Frontier-Class Multimodal LLMs
NVLM: Open Frontier-Class Multimodal LLMs
Wenliang Dai
Nayeon Lee
Boxin Wang
Zhuoling Yang
Zihan Liu
Jon Barker
Tuomas Rintamaki
M. Shoeybi
Bryan Catanzaro
Wei Ping
MLLM
VLM
LRM
37
50
0
17 Sep 2024
Surveying the MLLM Landscape: A Meta-Review of Current Surveys
Surveying the MLLM Landscape: A Meta-Review of Current Surveys
Ming Li
Keyu Chen
Ziqian Bi
Ming Liu
Benji Peng
...
Jinlang Wang
Sen Zhang
X. Pan
Jiawei Xu
Pohsun Feng
OffRL
34
2
0
17 Sep 2024
KALE: An Artwork Image Captioning System Augmented with Heterogeneous
  Graph
KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph
Yanbei Jiang
Krista A. Ehinger
Jey Han Lau
SLR
31
0
0
17 Sep 2024
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training
Yiyi Tao
Zhuoyue Wang
Hang Zhang
Lun Wang
VLM
35
3
0
15 Sep 2024
TriplePlay: Enhancing Federated Learning with CLIP for Non-IID Data and
  Resource Efficiency
TriplePlay: Enhancing Federated Learning with CLIP for Non-IID Data and Resource Efficiency
Ahmed Imteaj
Md Zarif Hossain
Saika Zaman
Abdur R. Shahid
VLM
19
1
0
09 Sep 2024
D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal
  models
D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal models
Matteo Forlini
Mihail Babcinschi
Giacomo Palmieri
Pedro Neto
24
1
0
21 Aug 2024
Instruction Tuning-free Visual Token Complement for Multimodal LLMs
Instruction Tuning-free Visual Token Complement for Multimodal LLMs
Dongsheng Wang
Jiequan Cui
Miaoge Li
Wang Lin
Bo Chen
Hanwang Zhang
MLLM
26
3
0
09 Aug 2024
The Data Addition Dilemma
The Data Addition Dilemma
Judy Hanwen Shen
Inioluwa Deborah Raji
Irene Y. Chen
22
5
0
08 Aug 2024
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language
  Modeling
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
William Y. Zhu
Keren Ye
Junjie Ke
Jiahui Yu
Leonidas J. Guibas
P. Milanfar
Feng Yang
35
0
0
07 Aug 2024
Unsupervised Domain Adaption Harnessing Vision-Language Pre-training
Unsupervised Domain Adaption Harnessing Vision-Language Pre-training
Wenlve Zhou
Zhiheng Zhou
VLM
31
6
0
05 Aug 2024
ActivityCLIP: Enhancing Group Activity Recognition by Mining
  Complementary Information from Text to Supplement Image Modality
ActivityCLIP: Enhancing Group Activity Recognition by Mining Complementary Information from Text to Supplement Image Modality
Guoliang Xu
Jianqin Yin
Feng Zhou
Yonghao Dang
VLM
28
0
0
29 Jul 2024
Learning Visual Grounding from Generative Vision and Language Model
Learning Visual Grounding from Generative Vision and Language Model
Shijie Wang
Dahun Kim
A. Taalimi
Chen Sun
Weicheng Kuo
ObjD
32
5
0
18 Jul 2024
NODE-Adapter: Neural Ordinary Differential Equations for Better
  Vision-Language Reasoning
NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning
Yi Zhang
Chun-Wun Cheng
Ke Yu
Zhihai He
Carola-Bibiane Schonlieb
Angelica I Aviles-Rivero
VLM
26
2
0
11 Jul 2024
1234...101112
Next