v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

International Conference on Machine Learning (ICML), 2022

7 February 2022

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 609 papers shown

Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training

Sean Robertson

Ewan Dunbar

SSL

226

03 Dec 2023

Stochastic Vision Transformers with Wasserstein Distance-Aware Attention

Franciskus Xaverius Erick

Mina Rezaei

Johanna P. Müller

Bernhard Kainz

236

30 Nov 2023

A-JEPA: Joint-Embedding Predictive Architecture Can Listen

Zhengcong Fei

Mingyuan Fan

Junshi Huang

388

27 Nov 2023

SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation

206

27 Nov 2023

Explainable Time Series Anomaly Detection using Masked Latent Generative ModelingPattern Recognition (Pattern Recogn.), 2023

Daesoo Lee

Sara Malacarne

Erlend Aune

AI4TS

338

21 Nov 2023

From Wrong To Right: A Recursive Approach Towards Vision-Language ExplanationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Boyi Li

252

21 Nov 2023

Self-Distilled Representation Learning for Time Series

Felix Pieper

Konstantin Ditschuneit

157

19 Nov 2023

R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic PiecesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Heng-Jui Chang

James R. Glass

246

15 Nov 2023

SS-MAE: Spatial-Spectral Masked Auto-Encoder for Multi-Source Remote Sensing Image Classification

Junyu Dong

187

08 Nov 2023

OmniVec: Learning robust representations with cross modal sharingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Siddharth Srivastava

Gaurav Sharma

SSL

288

07 Nov 2023

FATE: Feature-Agnostic Transformer-based Encoder for learning generalized embedding spaces in flow cytometry dataIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Margarita Maurer-Granofszky

Michael N. Dworzak

MedIm

169

06 Nov 2023

Pseudo-Labeling for Domain-Agnostic Bangla Automatic Speech Recognition

Quazi Sarwar Muhtaseem

Md. Tariqul Islam

Shammur A. Chowdhury

Firoj Alam

290

06 Nov 2023

Towards Calibrated Robust Fine-Tuning of Vision-Language ModelsNeural Information Processing Systems (NeurIPS), 2023

Alexander G. Hauptmann

Zhi-Qi Cheng

Kyungwoo Song

VLM

743

03 Nov 2023

Investigating Relative Performance of Transfer and Meta Learning

Benji Alwis

31 Oct 2023

Mean BERTs make erratic language teachers: the effectiveness of latent bootstrapping in low-resource settings

David Samuel

180

30 Oct 2023

Pre-training with Random Orthogonal Projection Image ModelingInternational Conference on Learning Representations (ICLR), 2023

341

28 Oct 2023

Large-scale Foundation Models and Generative AI for BigData NeuroscienceNeurosciences research (Neurosci Res), 2023

Ran Wang

Zhe Sage Chen

MedIm AI4CE LRM

181

27 Oct 2023

Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-EncoderNeural Information Processing Systems (NeurIPS), 2023

212

25 Oct 2023

Fine tuning Pre trained Models for Robustness Under Noisy LabelsInternational Joint Conference on Artificial Intelligence (IJCAI), 2023

372

24 Oct 2023

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual RepresentationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

395

22 Oct 2023

Learning with Unmasked Tokens Drives Stronger Vision Learners

294

20 Oct 2023

A Car Model Identification System for Streamlining the Automobile Sales Process

Said Togru

Marco Moldovan

218

19 Oct 2023

Detecting Speech Abnormalities with a Perceiver-based Sequence Classifier that Leverages a Universal Speech ModelAutomatic Speech Recognition & Understanding (ASRU), 2023

178

16 Oct 2023

Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and TextIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Chanho Park

Chengsong Lu

Mingjie Chen

Thomas Hain

397

12 Oct 2023

Incorporating Domain Knowledge Graph into Multimodal Movie Genre Classification with Self-Supervised Attention and Contrastive LearningACM Multimedia (ACM MM), 2023

210

12 Oct 2023

Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading

295

08 Oct 2023

Enhancing Representations through Heterogeneous Self-Supervised LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

366

08 Oct 2023

OMG-ATTACK: Self-Supervised On-Manifold Generation of Transferable Evasion Attacks

176

05 Oct 2023

Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit PredictionInternational Conference on Learning Representations (ICLR), 2023

Jiatong Shi

273

04 Oct 2023

Operator Learning Meets Numerical Analysis: Improving Neural Networks through Iterative Methods

David van Dijk

168

02 Oct 2023

Active Learning Based Fine-Tuning Framework for Speech Emotion RecognitionAutomatic Speech Recognition & Understanding (ASRU), 2023

347

30 Sep 2023

AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition

212

29 Sep 2023

Graph-level Representation Learning with Joint-Embedding Predictive Architectures

520

27 Sep 2023

Joint Prediction and Denoising for Large-scale Multilingual Self-supervised LearningAutomatic Speech Recognition & Understanding (ASRU), 2023

Jiatong Shi

Wangyou Zhang

265

26 Sep 2023

$M$^{3}$3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding$

^{3}

3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understandingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Muhammad Abdullah Jamal

Omid Mohareri

3DPC

259

26 Sep 2023

SeMAnD: Self-Supervised Anomaly Detection in Multimodal Geospatial Datasets

212

26 Sep 2023

Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation LearningAutomatic Speech Recognition & Understanding (ASRU), 2023

Xie Chen

200

25 Sep 2023

^3

CS: Multi-Target Masked Point Modeling with Learnable Codebook and Siamese Decoders

204

23 Sep 2023

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Ziyang Ma

Wen Wu

Zhisheng Zheng

Yiwei Guo

Qian Chen

Shiliang Zhang

Xie Chen

246

19 Sep 2023

Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks

Sizhou Chen

Songyang Gao

Sen Fang

221

14 Sep 2023

CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech EncodersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

335

14 Sep 2023

Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio RepresentationInternational Conference on Multimodal Interaction (ICMI), 2023

227

11 Sep 2023

Multimodal Fish Feeding Intensity Assessment in AquacultureIEEE Transactions on Automation Science and Engineering (IEEE TASE), 2023

289

10 Sep 2023

DropPos: Pre-Training Vision Transformers by Reconstructing Dropped PositionsNeural Information Processing Systems (NeurIPS), 2023

262

07 Sep 2023

Leveraging Label Information for Multimodal Emotion RecognitionInterspeech (Interspeech), 2023

239

05 Sep 2023

RepCodec: A Speech Representation Codec for Speech TokenizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Zhichao Huang

Chutong Meng

Tom Ko

217

31 Aug 2023

Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech RecognitionInterspeech (Interspeech), 2023

Zhisheng Zheng

Ziyang Ma

Yu Wang

Xie Chen

185

28 Aug 2023

Diversified Ensemble of Independent Sub-Networks for Robust Self-Supervised Representation Learning

Eyke Hüllermeier

293

28 Aug 2023

Rep2wav: Noise Robust text-to-speech Using self-supervised representations

Rilin Chen

Yuchen Hu

208

28 Aug 2023

Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing HeadsComputer Speech and Language (CSL), 2023

Mirco Ravanelli

240

28 Aug 2023