Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2202.03555
Cited By
v1
v2
v3 (latest)
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
International Conference on Machine Learning (ICML), 2022
7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"
50 / 609 papers shown
Multi-Modal Recommendation System with Auxiliary Information
Mufhumudzi Muthivhi
Terence L van Zyl
Hairong Wang
98
4
0
13 Oct 2022
The Hidden Uniform Cluster Prior in Self-Supervised Learning
International Conference on Learning Representations (ICLR), 2022
Mahmoud Assran
Randall Balestriero
Quentin Duval
Florian Bordes
Ishan Misra
Piotr Bojanowski
Pascal Vincent
Michael G. Rabbat
Nicolas Ballas
SSL
208
62
0
13 Oct 2022
On Compressing Sequences for Self-Supervised Speech Models
Spoken Language Technology Workshop (SLT), 2022
Yen Meng
Hsuan-Jui Chen
Jiatong Shi
Shinji Watanabe
Paola García
Hung-yi Lee
Hao Tang
SSL
197
15
0
13 Oct 2022
Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic Models
International Symposium on Chinese Spoken Language Processing (ISCSLP), 2022
Haoyu Wang
Weiqiang Zhang
Hongbin Suo
Yulong Wan
157
1
0
13 Oct 2022
Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
DongSeon Hwang
K. Sim
Yu Zhang
Trevor Strohman
205
12
0
11 Oct 2022
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2022
Zijia Zhao
Longteng Guo
Xingjian He
Shuai Shao
Zehuan Yuan
Jing Liu
303
13
0
09 Oct 2022
CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
Interspeech (Interspeech), 2022
Chutong Meng
Junyi Ao
Tom Ko
Mingxuan Wang
Haizhou Li
SSL
225
7
0
08 Oct 2022
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zi-Hua Zhang
Long Zhou
Junyi Ao
Shujie Liu
Lirong Dai
Jinyu Li
Furu Wei
259
62
0
07 Oct 2022
Improving Label-Deficient Keyword Spotting Through Self-Supervised Pretraining
H. S. Bovbjerg
Zheng-Hua Tan
VLM
207
5
0
04 Oct 2022
That Sounds Right: Auditory Self-Supervision for Dynamic Robot Manipulation
Conference on Robot Learning (CoRL), 2022
Abitha Thankaraj
Lerrel Pinto
168
20
0
03 Oct 2022
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Spoken Language Technology Workshop (SLT), 2022
Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Layne Berry
Hung-yi Lee
David Harwath
VLM
CLIP
403
40
0
03 Oct 2022
Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods
Skanda Koppula
Yazhe Li
Evan Shelhamer
Andrew Jaegle
Nikhil Parthasarathy
Relja Arandjelović
João Carreira
Olivier J. Hénaff
276
10
0
30 Sep 2022
Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and Audio
Spoken Language Technology Workshop (SLT), 2022
Yan Gao
Javier Fernandez-Marques
Titouan Parcollet
Pedro Porto Buarque de Gusmão
Nicholas D. Lane
201
9
0
30 Sep 2022
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Zi-Hua Zhang
Sanyuan Chen
Long Zhou
Yu Wu
Shuo Ren
...
Zhuoyuan Yao
Xun Gong
Lirong Dai
Jinyu Li
Furu Wei
312
68
0
30 Sep 2022
TVLT: Textless Vision-Language Transformer
Neural Information Processing Systems (NeurIPS), 2022
Zineng Tang
Jaemin Cho
Yixin Nie
Joey Tianyi Zhou
VLM
344
36
0
28 Sep 2022
An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis
Tobias Hallmen
Silvan Mertes
Dominik Schiller
Elisabeth André
126
5
0
28 Sep 2022
Implementing and Experimenting with Diffusion Models for Text-to-Image Generation
Robin Zbinden
135
5
0
22 Sep 2022
Deep Lake: a Lakehouse for Deep Learning
Conference on Innovative Data Systems Research (CIDR), 2022
S. Hambardzumyan
Abhina Tuli
Levon Ghukasyan
Fariz Rahman
Hrant Topchyan
...
Mark McQuade
M. Harutyunyan
Tatevik Hakobyan
I. Stranic
Davit Buniatyan
217
30
0
22 Sep 2022
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models
R. Olivier
H. Abdullah
Bhiksha Raj
AAML
268
1
0
17 Sep 2022
Exploring Target Representations for Masked Autoencoders
International Conference on Learning Representations (ICLR), 2022
Xingbin Liu
Jinghao Zhou
Tao Kong
Xianming Lin
Rongrong Ji
671
58
0
08 Sep 2022
Generalization in Neural Networks: A Broad Survey
Neurocomputing (Neurocomputing), 2022
Chris Rohlfs
OOD
AI4CE
279
19
0
04 Sep 2022
BinImg2Vec: Augmenting Malware Binary Image Classification with Data2Vec
International Conference on Applied Informatics and Communication (ICAIC), 2022
Joon Sern Lee
Kai Keng Tay
Zong Fu Chua
93
2
0
02 Sep 2022
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Computer Vision and Pattern Recognition (CVPR), 2022
Xiaoyi Dong
Jianmin Bao
Yinglin Zheng
Ting Zhang
Dongdong Chen
...
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
VLM
281
222
0
25 Aug 2022
AI and 6G into the Metaverse: Fundamentals, Challenges and Future Research Trends
IEEE Open Journal of the Communications Society (OJ-COMS), 2022
Muhammad Zawish
Fayaz Ali Dharejo
Sunder Ali Khowaja
Saleem Raza
Steven Davy
Kapal Dev
P. Bellavista
241
117
0
23 Aug 2022
Estimating a potential without the agony of the partition function
SIAM Journal on Mathematics of Data Science (SIMODS), 2022
E. Haber
Moshe Eliasof
L. Tenorio
272
2
0
19 Aug 2022
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
405
392
0
12 Aug 2022
MILAN: Masked Image Pretraining on Language Assisted Representation
Zejiang Hou
Fei Sun
Yen-kuang Chen
Yuan Xie
S. Kung
ViT
302
83
0
11 Aug 2022
Understanding Masked Image Modeling via Learning Occlusion Invariant Feature
Computer Vision and Pattern Recognition (CVPR), 2022
Xiangwen Kong
Xiangyu Zhang
SSL
210
66
0
08 Aug 2022
SdAE: Self-distillated Masked Autoencoder
European Conference on Computer Vision (ECCV), 2022
Yabo Chen
Yuchen Liu
Dongsheng Jiang
Xiaopeng Zhang
Wenrui Dai
H. Xiong
Qi Tian
ViT
216
86
0
31 Jul 2022
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
Chaoning Zhang
Chenshuang Zhang
Junha Song
John Seon Keun Yi
Kang Zhang
In So Kweon
SSL
234
94
0
30 Jul 2022
UAVM: Towards Unifying Audio and Visual Models
IEEE Signal Processing Letters (SPL), 2022
Yuan Gong
Alexander H. Liu
Andrew Rouditchenko
James R. Glass
299
30
0
29 Jul 2022
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Knowledge Discovery and Data Mining (KDD), 2022
Gopinath Chennupati
Milind Rao
Gurpreet Chadha
Aaron Eakin
A. Raju
...
Andrew Oberlin
Buddha Nandanoor
Prahalad Venkataramanan
Zheng Wu
Pankaj Sitpure
CLL
221
8
0
19 Jul 2022
Bootstrapped Masked Autoencoders for Vision BERT Pretraining
European Conference on Computer Vision (ECCV), 2022
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
218
88
0
14 Jul 2022
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality
Neural Information Processing Systems (NeurIPS), 2022
Wei-Ning Hsu
Bowen Shi
SSL
VLM
316
52
0
14 Jul 2022
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models
Interspeech (Interspeech), 2022
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
180
34
0
14 Jul 2022
Masked Autoencoders that Listen
Neural Information Processing Systems (NeurIPS), 2022
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
535
387
0
13 Jul 2022
Big Learning
Yulai Cong
Miaoyun Zhao
AI4CE
391
0
0
08 Jul 2022
Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR
Interspeech (Interspeech), 2022
Kun Wei
Yike Zhang
Sining Sun
Lei Xie
Long Ma
178
10
0
03 Jul 2022
FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopy
Scientific Data (Sci Data), 2022
Nikil Ravi
Pranshu Chaturvedi
Eliu A. Huerta
Zhengchun Liu
Ryan Chard
Aristana Scourtas
K. J. Schmidt
Kyle Chard
Ben Blaiszik
Ian Foster
388
42
0
01 Jul 2022
Analysis of Self-Supervised Learning and Dimensionality Reduction Methods in Clustering-Based Active Learning for Speech Emotion Recognition
Interspeech (Interspeech), 2022
Einari Vaaras
Manu Airaksinen
Okko Räsänen
127
7
0
21 Jun 2022
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training
Interspeech (Interspeech), 2022
Chengyi Wang
Yiming Wang
Yu Wu
Sanyuan Chen
Jinyu Li
Shujie Liu
Furu Wei
SSL
206
21
0
21 Jun 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
International Journal of Computer Vision (IJCV), 2022
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Jianlong Wu
Yong Liu
Dacheng Tao
ViT
304
47
0
19 Jun 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos
Computer Vision and Pattern Recognition (CVPR), 2022
Rohit Girdhar
Alaaeldin El-Nouby
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
ViT
268
118
0
16 Jun 2022
Masked Frequency Modeling for Self-Supervised Visual Pre-Training
International Conference on Learning Representations (ICLR), 2022
Jiahao Xie
Wei Li
Xiaohang Zhan
Ziwei Liu
Yew-Soon Ong
Chen Change Loy
246
100
0
15 Jun 2022
Masked Siamese ConvNets
L. Jing
Jiachen Zhu
Yann LeCun
SSL
208
37
0
15 Jun 2022
Language Models are General-Purpose Interfaces
Y. Hao
Haoyu Song
Li Dong
Shaohan Huang
Zewen Chi
Wenhui Wang
Shuming Ma
Furu Wei
MLLM
216
110
0
13 Jun 2022
Extreme Masking for Learning Instance and Distributed Visual Representations
Zhirong Wu
Zihang Lai
Xiao Sun
Stephen Lin
296
24
0
09 Jun 2022
Words are all you need? Language as an approximation for human similarity judgments
International Conference on Learning Representations (ICLR), 2022
Raja Marjieh
Pol van Rijn
Ilia Sucholutsky
T. Sumers
Harin Lee
Thomas Griffiths
Nori Jacoby
262
22
0
08 Jun 2022
Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks
International Conference on Learning Representations (ICLR), 2022
Jia Pan
Pan Zhou
Shuicheng Yan
SSL
349
20
0
08 Jun 2022
Masked Unsupervised Self-training for Label-free Image Classification
International Conference on Learning Representations (ICLR), 2022
Junnan Li
Silvio Savarese
Steven C. H. Hoi
VLM
SSL
148
19
0
07 Jun 2022
Previous
1
2
3
...
10
11
12
13
Next