ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.08919
  4. Cited By
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix

VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix

17 June 2022
Teng Wang
Wenhao Jiang
Zhichao Lu
Feng Zheng
Ran Cheng
Chengguo Yin
Ping Luo
    VLM
ArXivPDFHTML

Papers citing "VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix"

30 / 30 papers shown
Title
Continual Cross-Modal Generalization
Continual Cross-Modal Generalization
Yan Xia
Hai Huang
Minghui Fang
Zhou Zhao
CLL
52
0
0
01 Apr 2025
Diversity Covariance-Aware Prompt Learning for Vision-Language Models
Songlin Dong
Zhengdong Zhou
Chenhao Ding
Xinyuan Gao
Alex C. Kot
Yihong Gong
VPVLM
VLM
44
0
0
03 Mar 2025
Semantic-Aligned Adversarial Evolution Triangle for High-Transferability
  Vision-Language Attack
Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack
Xiaojun Jia
Sensen Gao
Qing-Wu Guo
Ke Ma
Yihao Huang
Simeng Qin
Yang Janet Liu
Ivor Tsang Fellow
Xiaochun Cao
AAML
35
3
0
04 Nov 2024
Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations
Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations
Minoh Jeong
Min Namgung
Zae Myung Kim
Dongyeop Kang
Yao-Yi Chiang
Alfred Hero
23
0
0
02 Oct 2024
Sample-agnostic Adversarial Perturbation for Vision-Language
  Pre-training Models
Sample-agnostic Adversarial Perturbation for Vision-Language Pre-training Models
Haonan Zheng
Wen Jiang
Xinyang Deng
Wenrui Li
VLM
AAML
16
2
0
06 Aug 2024
Hierarchical Memory for Long Video QA
Hierarchical Memory for Long Video QA
Yiqin Wang
Haoji Zhang
Yansong Tang
Yong-Jin Liu
Jiashi Feng
Jifeng Dai
Xiaojie Jin
55
2
0
30 Jun 2024
Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality
  Generation
Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation
Lincan Cai
Shuang Li
Wenxuan Ma
Jingxuan Kang
Binhui Xie
Zixun Sun
Chengwei Zhu
MoE
MoMe
29
0
0
13 Jun 2024
Not All Attention is Needed: Parameter and Computation Efficient
  Transfer Learning for Multi-modal Large Language Models
Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models
Qiong Wu
Weihao Ye
Yiyi Zhou
Xiaoshuai Sun
Rongrong Ji
MoE
25
1
0
22 Mar 2024
Boosting Transferability in Vision-Language Attacks via Diversification
  along the Intersection Region of Adversarial Trajectory
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory
Sensen Gao
Xiaojun Jia
Xuhong Ren
Ivor Tsang
Qing-Wu Guo
AAML
31
13
0
19 Mar 2024
Unlocking the Potential of Multimodal Unified Discrete Representation
  through Training-Free Codebook Optimization and Hierarchical Alignment
Unlocking the Potential of Multimodal Unified Discrete Representation through Training-Free Codebook Optimization and Hierarchical Alignment
Hai Huang
Yan Xia
Shengpeng Ji
Shulei Wang
Hanting Wang
Jieming Zhu
Zhenhua Dong
Zhou Zhao
19
6
0
08 Mar 2024
ProtChatGPT: Towards Understanding Proteins with Large Language Models
ProtChatGPT: Towards Understanding Proteins with Large Language Models
Chao Wang
Hehe Fan
Ruijie Quan
Yi Yang
26
12
0
15 Feb 2024
PowMix: A Versatile Regularizer for Multimodal Sentiment Analysis
PowMix: A Versatile Regularizer for Multimodal Sentiment Analysis
Efthymios Georgiou
Yannis Avrithis
Alexandros Potamianos
20
1
0
19 Dec 2023
RecExplainer: Aligning Large Language Models for Explaining
  Recommendation Models
RecExplainer: Aligning Large Language Models for Explaining Recommendation Models
Yuxuan Lei
Jianxun Lian
Jing Yao
Xu Huang
Defu Lian
Xing Xie
LRM
13
5
0
18 Nov 2023
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
  Large Model
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model
Fengxiang Bie
Yibo Yang
Zhongzhu Zhou
Adam Ghanem
Minjia Zhang
...
Pareesa Ameneh Golnari
David A. Clifton
Yuxiong He
Dacheng Tao
S. Song
EGVM
17
15
0
02 Sep 2023
Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models
Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models
Baoshuo Kan
Teng Wang
Wenpeng Lu
Xiantong Zhen
Weili Guan
Feng Zheng
VPVLM
VLM
13
25
0
22 Aug 2023
Set-level Guidance Attack: Boosting Adversarial Transferability of
  Vision-Language Pre-training Models
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models
Dong Lu
Zhiqiang Wang
Teng Wang
Weili Guan
Hongchang Gao
Feng Zheng
AAML
46
63
0
26 Jul 2023
PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language
  Pre-training via Prompting
PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language Pre-training via Prompting
Zixin Guo
T. Wang
Selen Pehlivan
Abduljalil Radman
Jorma T. Laaksonen
VLM
17
2
0
14 Jul 2023
Weakly Supervised Vision-and-Language Pre-training with Relative
  Representations
Weakly Supervised Vision-and-Language Pre-training with Relative Representations
Chi Chen
Peng Li
Maosong Sun
Yang Liu
14
1
0
24 May 2023
Text-based Person Search without Parallel Image-Text Data
Text-based Person Search without Parallel Image-Text Data
Yang Bai
Jingyao Wang
Min Cao
Cheng Chen
Ziqiang Cao
Liqiang Nie
Min Zhang
19
13
0
22 May 2023
Not All Semantics are Created Equal: Contrastive Self-supervised
  Learning with Automatic Temperature Individualization
Not All Semantics are Created Equal: Contrastive Self-supervised Learning with Automatic Temperature Individualization
Zimeng Qiu
Quanqi Hu
Zhuoning Yuan
Denny Zhou
Lijun Zhang
Tianbao Yang
27
11
0
19 May 2023
Self-Supervised Multimodal Learning: A Survey
Self-Supervised Multimodal Learning: A Survey
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
13
43
0
31 Mar 2023
Accelerating Vision-Language Pretraining with Free Language Modeling
Accelerating Vision-Language Pretraining with Free Language Modeling
Teng Wang
Yixiao Ge
Feng Zheng
Ran Cheng
Ying Shan
Xiaohu Qie
Ping Luo
VLM
MLLM
89
9
0
24 Mar 2023
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup
  for Visual Speech Translation and Recognition
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
Xize Cheng
Lin Li
Tao Jin
Rongjie Huang
Wang Lin
Zehan Wang
Huangdai Liu
Yejin Wang
Aoxiong Yin
Zhou Zhao
13
24
0
09 Mar 2023
CLIP-guided Prototype Modulating for Few-shot Action Recognition
CLIP-guided Prototype Modulating for Few-shot Action Recognition
Xiang Wang
Shiwei Zhang
Jun Cen
Changxin Gao
Yingya Zhang
Deli Zhao
Nong Sang
VLM
6
52
0
06 Mar 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Xiao Wang
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
24
195
0
20 Feb 2023
Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image
  Retrieval
Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval
Kuniaki Saito
Kihyuk Sohn
Xiang Zhang
Chun-Liang Li
Chen-Yu Lee
Kate Saenko
Tomas Pfister
17
61
0
06 Feb 2023
GIVL: Improving Geographical Inclusivity of Vision-Language Models with
  Pre-Training Methods
GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods
Da Yin
Feng Gao
Govind Thattai
Michael F. Johnston
Kai-Wei Chang
VLM
25
15
0
05 Jan 2023
A Survey of Mix-based Data Augmentation: Taxonomy, Methods,
  Applications, and Explainability
A Survey of Mix-based Data Augmentation: Taxonomy, Methods, Applications, and Explainability
Chengtai Cao
Fan Zhou
Yurou Dai
Jianping Wang
Kunpeng Zhang
AAML
11
26
0
21 Dec 2022
VindLU: A Recipe for Effective Video-and-Language Pretraining
VindLU: A Recipe for Effective Video-and-Language Pretraining
Feng Cheng
Xizi Wang
Jie Lei
David J. Crandall
Mohit Bansal
Gedas Bertasius
VLM
16
78
0
09 Dec 2022
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
1