Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2202.03555
Cited By
v1
v2
v3 (latest)
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
International Conference on Machine Learning (ICML), 2022
7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"
50 / 609 papers shown
FastMIM: Expediting Masked Image Modeling Pre-training for Vision
Jianyuan Guo
Kai Han
Han Wu
Yehui Tang
Yunhe Wang
Chang Xu
198
15
0
13 Dec 2022
Jointly Learning Visual and Auditory Speech Representations from Raw Data
International Conference on Learning Representations (ICLR), 2022
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Maja Pantic
SSL
309
70
0
12 Dec 2022
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Shuyang Gu
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
171
52
0
12 Dec 2022
TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Lixin Cao
Jun Wang
Ben Yang
Jane Polak Scowcroft
Dong Yu
144
4
0
12 Dec 2022
Deep Architectures for Content Moderation and Movie Content Rating
Fatih Çagatay Akyön
A. Temi̇zel
214
8
0
08 Dec 2022
Group Generalized Mean Pooling for Vision Transformer
ByungSoo Ko
Han-Gyu Kim
Byeongho Heo
Sangdoo Yun
Sanghyuk Chun
Geonmo Gu
Wonjae Kim
ViT
303
3
0
08 Dec 2022
Improved Speech Pre-Training with Supervision-Enhanced Acoustic Unit
Pengcheng Li
Genshun Wan
Fenglin Ding
Hang Chen
Jianqing Gao
Jia Pan
Cong Liu
SSL
190
1
0
07 Dec 2022
Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information
Fenglin Ding
Genshun Wan
Pengcheng Li
Jia Pan
Cong Liu
SSL
290
1
0
07 Dec 2022
Self-Supervised Audio-Visual Speech Representations Learning By Multimodal Self-Distillation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Jing-Xuan Zhang
Genshun Wan
Zhenhua Ling
Jia Pan
Jianqing Gao
Cong Liu
SSL
242
15
0
06 Dec 2022
Location-Aware Self-Supervised Transformers for Semantic Segmentation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Mathilde Caron
N. Houlsby
Cordelia Schmid
ViT
330
23
0
05 Dec 2022
MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning
Yi Zhou
Ruibin Yuan
Ge Zhang
Yi Ma
Chenghua Lin
...
Haoyu He
Emmanouil Benetos
Norbert Gyenge
Ruibo Liu
Jie Fu
SSL
184
28
0
05 Dec 2022
Exploring Stochastic Autoregressive Image Modeling for Visual Representation
AAAI Conference on Artificial Intelligence (AAAI), 2022
Yu-Hang Qi
Fan Yang
Yousong Zhu
Yufei Liu
Liwei Wu
Rui Zhao
Wei Li
DiffM
114
16
0
03 Dec 2022
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Interspeech (Interspeech), 2022
Xiaohuan Zhou
Jiaming Wang
Zeyu Cui
Shiliang Zhang
Zhijie Yan
Jingren Zhou
Chang Zhou
265
13
0
29 Nov 2022
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
AAAI Conference on Artificial Intelligence (AAAI), 2022
Pritam Sarkar
Ali Etemad
388
39
0
25 Nov 2022
TESSP: Text-Enhanced Self-Supervised Speech Pre-training
Zhuoyuan Yao
Shuo Ren
Sanyuan Chen
Ziyang Ma
Pengcheng Guo
Linfu Xie
212
5
0
24 Nov 2022
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token Migration
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Yunjie Tian
Lingxi Xie
Jihao Qiu
Jianbin Jiao
Yaowei Wang
Qi Tian
Qixiang Ye
ViT
205
20
0
23 Nov 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
IEEE transactions on multimedia (IEEE TMM), 2022
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
274
51
0
21 Nov 2022
CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow
IEEE International Conference on Computer Vision (ICCV), 2022
Philippe Weinzaepfel
Thomas Lucas
Vincent Leroy
Yohann Cabon
Vaibhav Arora
Romain Brégier
G. Csurka
L. Antsfeld
Boris Chidlovskii
Jérôme Revaud
ViT
498
160
0
18 Nov 2022
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Computer Vision and Pattern Recognition (CVPR), 2022
Weijie Su
Xizhou Zhu
Chenxin Tao
Lewei Lu
Bin Li
Gao Huang
Yu Qiao
Xiaogang Wang
Jie Zhou
Jifeng Dai
245
56
0
17 Nov 2022
CAE v2: Context Autoencoder with CLIP Target
Xinyu Zhang
Jiahui Chen
Junkun Yuan
Qiang Chen
Jian Wang
...
Jimin Pi
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
VLM
CLIP
276
25
0
17 Nov 2022
Assessing Neural Network Robustness via Adversarial Pivotal Tuning
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Peter Ebert Christensen
Vésteinn Snaebjarnarson
Andrea Dittadi
Serge Belongie
Sagie Benaim
AAML
228
1
0
17 Nov 2022
Prompt Tuning for Parameter-efficient Medical Image Segmentation
Marc Fischer
Alexander Bartler
Bin Yang
SSeg
179
32
0
16 Nov 2022
Stare at What You See: Masked Image Modeling without Reconstruction
Computer Vision and Pattern Recognition (CVPR), 2022
Hongwei Xue
Shiyang Feng
Hongyang Li
Yu Qiao
Hao Sun
Houqiang Li
Jiebo Luo
183
38
0
16 Nov 2022
Improving Speech Emotion Recognition with Unsupervised Speaking Style Transfer
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Leyuan Qu
Wei Wang
C. Weber
F. Ren
Taiha Li
S. Wermter
223
4
0
16 Nov 2022
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Computer Vision and Pattern Recognition (CVPR), 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
621
901
0
14 Nov 2022
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
Interspeech (Interspeech), 2022
Ziyang Ma
Zhisheng Zheng
Changli Tang
Yujin Wang
Xie Chen
322
21
0
14 Nov 2022
SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation
Yi Wang
Nassim Ait Ali Braham
Zhitong Xiong
Chenying Liu
C. Albrecht
Xiao Xiang Zhu
233
96
0
13 Nov 2022
MARLIN: Masked Autoencoder for facial video Representation LearnINg
Computer Vision and Pattern Recognition (CVPR), 2022
Zhixi Cai
Shreya Ghosh
Kalin Stefanov
Abhinav Dhall
Jianfei Cai
Hamid Rezatofighi
Reza Haffari
Munawar Hayat
ViT
CVBM
248
93
0
12 Nov 2022
Okapi: Generalising Better by Making Statistical Matches Match
Neural Information Processing Systems (NeurIPS), 2022
Myles Bartlett
Sara Romiti
V. Sharmanska
Novi Quadrianto
193
3
0
07 Nov 2022
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Neural Information Processing Systems (NeurIPS), 2022
Yonggan Fu
Yang Zhang
Kaizhi Qian
Zhifan Ye
Zhongzhi Yu
Cheng-I Jeff Lai
Yingyan Lin
378
10
0
02 Nov 2022
data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Vasista Sai Lodagala
Sreyan Ghosh
S. Umesh
SSL
158
5
0
02 Nov 2022
Deep Multimodal Fusion for Generalizable Person Re-identification
Suncheng Xiang
Hao Chen
Jing Gao
Jiawang Mou
Ting Liu
Xiaobo Li
Yuzhuo Fu
308
6
0
02 Nov 2022
More Speaking or More Speakers?
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Dan Berrebbi
R. Collobert
Navdeep Jaitly
Tatiana Likhomanenko
224
6
0
02 Nov 2022
Self-Supervised Learning with Limited Labeled Data for Prostate Cancer Detection in High Frequency Ultrasound
IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control (IEEE TUFFC), 2022
P. Wilson
Mahdi Gilany
A. Jamzad
Fahimeh Fooladgar
Minh-Son To
Brian Wodlinger
Purang Abolmaesumi
P. Mousavi
145
19
0
01 Nov 2022
Speech-text based multi-modal training with bidirectional attention for improved speech recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Yuhang Yang
Haihua Xu
Hao-Ming Huang
Eng Siong Chng
Sheng Li
187
9
0
01 Nov 2022
Training Vision-Language Models with Less Bimodal Supervision
Conference on Automated Knowledge Base Construction (AKBC), 2022
Elad Segal
Ben Bogin
Jonathan Berant
VLM
125
2
0
01 Nov 2022
token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Xianghu Yue
Junyi Ao
Xiaoxue Gao
Haizhou Li
SSL
203
8
0
30 Oct 2022
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
Automatic Speech Recognition & Understanding (ASRU), 2022
Yujin Wang
Changli Tang
Ziyang Ma
Zhisheng Zheng
Xie Chen
Weiqiang Zhang
254
2
0
27 Oct 2022
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Qiu-shi Zhu
Long Zhou
Jie Zhang
Shujie Liu
Yu-Chen Hu
Lirong Dai
VLM
SSL
182
43
0
27 Oct 2022
Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
SSL
398
43
0
26 Oct 2022
AVES: Animal Vocalization Encoder based on Self-Supervision
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Masato Hagiwara
CLIP
VLM
AI4TS
178
45
0
26 Oct 2022
Learning Explicit Object-Centric Representations with Vision Transformers
Oscar Vikström
Alexander Ilin
OCL
ViT
215
5
0
25 Oct 2022
Adversarial Pretraining of Self-Supervised Deep Networks: Past, Present and Future
Guo-Jun Qi
M. Shah
SSL
156
8
0
23 Oct 2022
Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Cheol Jun Cho
Peter Wu
Abdel-rahman Mohamed
Gopala K. Anumanchipalli
204
44
0
21 Oct 2022
Towards Sustainable Self-supervised Learning
Shanghua Gao
Pan Zhou
Mingg-Ming Cheng
Shuicheng Yan
CLL
354
11
0
20 Oct 2022
CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion
Neural Information Processing Systems (NeurIPS), 2022
Philippe Weinzaepfel
Vincent Leroy
Thomas Lucas
Romain Brégier
Yohann Cabon
Vaibhav Arora
L. Antsfeld
Boris Chidlovskii
G. Csurka
Jérôme Revaud
SSL
373
127
0
19 Oct 2022
A Unified View of Masked Image Modeling
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
VLM
242
42
0
19 Oct 2022
Continuous Pseudo-Labeling from the Start
International Conference on Learning Representations (ICLR), 2022
Dan Berrebbi
R. Collobert
Samy Bengio
Navdeep Jaitly
Tatiana Likhomanenko
228
17
0
17 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Spoken Language Technology Workshop (SLT), 2022
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELM
SSL
255
38
0
16 Oct 2022
Improving generalizability of distilled self-supervised speech processing models under distorted settings
Spoken Language Technology Workshop (SLT), 2022
Kuan-Po Huang
Yu-Kuan Fu
Tsung-Yuan Hsu
Fabian Ritter-Gutierrez
Fan Wang
Liang-Hsuan Tseng
Yu Zhang
Hung-yi Lee
254
15
0
14 Oct 2022
Previous
1
2
3
...
10
11
12
13
9
Next
Page 10 of 13
Page
of 13
Go