Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2202.03555
Cited By
v1
v2
v3 (latest)
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
International Conference on Machine Learning (ICML), 2022
7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"
50 / 609 papers shown
Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training
Sean Robertson
Ewan Dunbar
SSL
226
1
0
03 Dec 2023
Stochastic Vision Transformers with Wasserstein Distance-Aware Attention
Franciskus Xaverius Erick
Mina Rezaei
Johanna P. Müller
Bernhard Kainz
236
0
0
30 Nov 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
388
34
0
27 Nov 2023
SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation
Jia Li
Yanyan Shen
Lei Chen
Charles Wang Wai Ng
206
6
0
27 Nov 2023
Explainable Time Series Anomaly Detection using Masked Latent Generative Modeling
Pattern Recognition (Pattern Recogn.), 2023
Daesoo Lee
Sara Malacarne
Erlend Aune
AI4TS
338
25
0
21 Nov 2023
From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Jiaxin Ge
Sanjay Subramanian
Trevor Darrell
Boyi Li
LRM
252
4
0
21 Nov 2023
Self-Distilled Representation Learning for Time Series
Felix Pieper
Konstantin Ditschuneit
Martin Genzel
Alexandra Lindt
Johannes Otterbach
AI4TS
157
3
0
19 Nov 2023
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Heng-Jui Chang
James R. Glass
246
8
0
15 Nov 2023
SS-MAE: Spatial-Spectral Masked Auto-Encoder for Multi-Source Remote Sensing Image Classification
Junyan Lin
Feng Gao
Xiaochen Shi
Junyu Dong
Q. Du
187
80
0
08 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Siddharth Srivastava
Gaurav Sharma
SSL
288
85
0
07 Nov 2023
FATE: Feature-Agnostic Transformer-based Encoder for learning generalized embedding spaces in flow cytometry data
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Lisa Weijler
Florian Kowarsch
Michael Reiter
Pedro Hermosilla
Margarita Maurer-Granofszky
Michael N. Dworzak
MedIm
169
5
0
06 Nov 2023
Pseudo-Labeling for Domain-Agnostic Bangla Automatic Speech Recognition
R. N. Nandi
Mehadi Hasan Menon
Tareq Al Muntasir
Sagor Sarker
Quazi Sarwar Muhtaseem
Md. Tariqul Islam
Shammur A. Chowdhury
Firoj Alam
290
3
0
06 Nov 2023
Towards Calibrated Robust Fine-Tuning of Vision-Language Models
Neural Information Processing Systems (NeurIPS), 2023
Changdae Oh
Hyesu Lim
Mijoo Kim
Dongyoon Han
Junhyeok Park
Euiseog Jeong
Alexander G. Hauptmann
Zhi-Qi Cheng
Kyungwoo Song
VLM
743
30
0
03 Nov 2023
Investigating Relative Performance of Transfer and Meta Learning
Benji Alwis
90
0
0
31 Oct 2023
Mean BERTs make erratic language teachers: the effectiveness of latent bootstrapping in low-resource settings
David Samuel
180
4
0
30 Oct 2023
Pre-training with Random Orthogonal Projection Image Modeling
International Conference on Learning Representations (ICLR), 2023
Maryam Haghighat
Peyman Moghadam
Shaheer Mohamed
Piotr Koniusz
VLM
341
14
0
28 Oct 2023
Large-scale Foundation Models and Generative AI for BigData Neuroscience
Neurosciences research (Neurosci Res), 2023
Ran Wang
Zhe Sage Chen
MedIm
AI4CE
LRM
181
18
0
27 Oct 2023
Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder
Neural Information Processing Systems (NeurIPS), 2023
Huiwon Jang
Jihoon Tack
Daewon Choi
Jongheon Jeong
Jinwoo Shin
212
6
0
25 Oct 2023
Fine tuning Pre trained Models for Robustness Under Noisy Labels
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Sumyeong Ahn
Sihyeon Kim
Jongwoo Ko
SeYoung Yun
AAML
NoLa
372
16
0
24 Oct 2023
Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Kun Wei
Bei Li
Hang Lv
Quan Lu
Ning Jiang
Lei Xie
395
11
0
22 Oct 2023
Learning with Unmasked Tokens Drives Stronger Vision Learners
Taekyung Kim
Sanghyuk Chun
Byeongho Heo
Dongyoon Han
SSL
294
3
0
20 Oct 2023
A Car Model Identification System for Streamlining the Automobile Sales Process
Said Togru
Marco Moldovan
218
0
0
19 Oct 2023
Detecting Speech Abnormalities with a Perceiver-based Sequence Classifier that Leverages a Universal Speech Model
Automatic Speech Recognition & Understanding (ASRU), 2023
H. Soltau
Izhak Shafran
Alex Ottenwess
Joseph R. Duffy
Rene L. Utianski
L. Barnard
John L. Stricker
D. Wiepert
David T. Jones
Hugo Botha
178
3
0
16 Oct 2023
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Chanho Park
Chengsong Lu
Mingjie Chen
Thomas Hain
397
7
0
12 Oct 2023
Incorporating Domain Knowledge Graph into Multimodal Movie Genre Classification with Self-Supervised Attention and Contrastive Learning
ACM Multimedia (ACM MM), 2023
Jiaqi Li
Guilin Qi
Chuanyi Zhang
Yongrui Chen
Yiming Tan
Chenlong Xia
Ye Tian
210
6
0
12 Oct 2023
Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading
Songtao Luo
Shuang Yang
Shiguang Shan
Xilin Chen
295
2
0
08 Oct 2023
Enhancing Representations through Heterogeneous Self-Supervised Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Zhongyu Li
Bo-Wen Yin
Yongxiang Liu
Tianpeng Liu
Ming-Ming Cheng
SSL
366
3
0
08 Oct 2023
OMG-ATTACK: Self-Supervised On-Manifold Generation of Transferable Evasion Attacks
Ofir Bar Tal
Adi Haviv
Amit H. Bermano
AAML
176
0
0
05 Oct 2023
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
International Conference on Learning Representations (ICLR), 2023
Jiatong Shi
Hirofumi Inaguma
Xutai Ma
Ilia Kulikov
Anna Y. Sun
273
36
0
04 Oct 2023
Operator Learning Meets Numerical Analysis: Improving Neural Networks through Iterative Methods
E. Zappala
Daniel Levine
Shiyang Zhang
S. Rizvi
Sacha Lévy
David van Dijk
168
1
0
02 Oct 2023
Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition
Automatic Speech Recognition & Understanding (ASRU), 2023
Dongyuan Li
Yusong Wang
Kotaro Funakoshi
Manabu Okumura
347
5
0
30 Sep 2023
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
Andrew Rouditchenko
R. Collobert
Tatiana Likhomanenko
VLM
212
5
0
29 Sep 2023
Graph-level Representation Learning with Joint-Embedding Predictive Architectures
Geri Skenderi
Hang Li
Shucheng Zhou
Marco Cristani
AI4TS
GNN
520
11
0
27 Sep 2023
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Automatic Speech Recognition & Understanding (ASRU), 2023
William Chen
Jiatong Shi
Brian Yan
Dan Berrebbi
Wangyou Zhang
Yifan Peng
Xuankai Chang
Soumi Maiti
Shinji Watanabe
265
13
0
26 Sep 2023
M
3
^{3}
3
3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Muhammad Abdullah Jamal
Omid Mohareri
3DPC
259
2
0
26 Sep 2023
SeMAnD: Self-Supervised Anomaly Detection in Multimodal Geospatial Datasets
Daria Reshetova
Swetava Ganguli
C. V. K. Iyer
Vipul Pandey
212
4
0
26 Sep 2023
Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
Automatic Speech Recognition & Understanding (ASRU), 2023
Guan-lin Yang
Ziyang Ma
Zhisheng Zheng
Ya-Zhen Song
Zhikang Niu
Xie Chen
200
9
0
25 Sep 2023
M
3
^3
3
CS: Multi-Target Masked Point Modeling with Learnable Codebook and Siamese Decoders
Qibo Qiu
Honghui Yang
Wenxiao Wang
Shun Zhang
Haiming Gao
Haochao Ying
Wei Hua
Xiaofei He
3DPC
204
0
0
23 Sep 2023
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Ziyang Ma
Wen Wu
Zhisheng Zheng
Yiwei Guo
Qian Chen
Shiliang Zhang
Xie Chen
246
29
0
19 Sep 2023
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks
Sizhou Chen
Songyang Gao
Sen Fang
221
0
0
14 Sep 2023
CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Heng-Jui Chang
Ning Dong
Ruslan Mavlyutov
Sravya Popuri
Yu-An Chung
335
8
0
14 Sep 2023
Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation
International Conference on Multimodal Interaction (ICMI), 2023
Anna Deichler
Shivam Mehta
Simon Alexanderson
Jonas Beskow
DiffM
227
30
0
11 Sep 2023
Multimodal Fish Feeding Intensity Assessment in Aquaculture
IEEE Transactions on Automation Science and Engineering (IEEE TASE), 2023
Meng Cui
Xubo Liu
Haohe Liu
Zhuangzhuang Du
Tao Chen
Guoping Lian
Daoliang Li
Wenwu Wang
289
22
0
10 Sep 2023
DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions
Neural Information Processing Systems (NeurIPS), 2023
Haochen Wang
Junsong Fan
Yuxi Wang
Kaiyou Song
Tong Wang
Zhaoxiang Zhang
262
25
0
07 Sep 2023
Leveraging Label Information for Multimodal Emotion Recognition
Interspeech (Interspeech), 2023
Pei-Hsin Wang
Sunlu Zeng
Junqing Chen
Lu Fan
Meng Chen
Youzheng Wu
Xiaodong He
239
6
0
05 Sep 2023
RepCodec: A Speech Representation Codec for Speech Tokenization
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhichao Huang
Chutong Meng
Tom Ko
217
41
0
31 Aug 2023
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition
Interspeech (Interspeech), 2023
Zhisheng Zheng
Ziyang Ma
Yu Wang
Xie Chen
185
3
0
28 Aug 2023
Diversified Ensemble of Independent Sub-Networks for Robust Self-Supervised Representation Learning
Amirhossein Vahidi
Lisa Wimmer
H. Gündüz
B. Bischl
Eyke Hüllermeier
Mina Rezaei
OOD
UQCV
293
4
0
28 Aug 2023
Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Qiu-shi Zhu
Yunting Gu
Rilin Chen
Chao Weng
Yuchen Hu
Lirong Dai
Jie Zhang
AI4TS
208
3
0
28 Aug 2023
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
Computer Speech and Language (CSL), 2023
Salah Zaiem
Youcef Kemiche
Titouan Parcollet
S. Essid
Mirco Ravanelli
SSL
240
19
0
28 Aug 2023
Previous
1
2
3
...
5
6
7
...
11
12
13
Next
Page 6 of 13
Page
of 13
Go