v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

International Conference on Machine Learning (ICML), 2022

7 February 2022

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 609 papers shown

Multi-Modal Recommendation System with Auxiliary Information

Mufhumudzi Muthivhi

Terence L van Zyl

Hairong Wang

13 Oct 2022

The Hidden Uniform Cluster Prior in Self-Supervised LearningInternational Conference on Learning Representations (ICLR), 2022

Pascal Vincent

208

13 Oct 2022

On Compressing Sequences for Self-Supervised Speech ModelsSpoken Language Technology Workshop (SLT), 2022

Jiatong Shi

Hao Tang

197

13 Oct 2022

Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic ModelsInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2022

157

13 Oct 2022

Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

205

11 Oct 2022

MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation LearningAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2022

Jing Liu

303

09 Oct 2022

CoBERT: Self-Supervised Speech Representation Learning Through Code Representation LearningInterspeech (Interspeech), 2022

Haizhou Li

225

08 Oct 2022

SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-trainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

259

07 Oct 2022

Improving Label-Deficient Keyword Spotting Through Self-Supervised Pretraining

H. S. Bovbjerg

Zheng-Hua Tan

VLM

207

04 Oct 2022

That Sounds Right: Auditory Self-Supervision for Dynamic Robot ManipulationConference on Robot Learning (CoRL), 2022

Abitha Thankaraj

Lerrel Pinto

168

03 Oct 2022

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language ModelSpoken Language Technology Workshop (SLT), 2022

403

03 Oct 2022

Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods

276

30 Sep 2022

Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and AudioSpoken Language Technology Workshop (SLT), 2022

Yan Gao

Javier Fernandez-Marques

Titouan Parcollet

Pedro Porto Buarque de Gusmão

Nicholas D. Lane

201

30 Sep 2022

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual DataIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

...

312

30 Sep 2022

TVLT: Textless Vision-Language TransformerNeural Information Processing Systems (NeurIPS), 2022

344

28 Sep 2022

An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis

126

28 Sep 2022

Implementing and Experimenting with Diffusion Models for Text-to-Image Generation

Robin Zbinden

135

22 Sep 2022

Deep Lake: a Lakehouse for Deep LearningConference on Innovative Data Systems Research (CIDR), 2022

...

217

22 Sep 2022

Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models

R. Olivier

H. Abdullah

Bhiksha Raj

AAML

268

17 Sep 2022

Exploring Target Representations for Masked AutoencodersInternational Conference on Learning Representations (ICLR), 2022

671

08 Sep 2022

Generalization in Neural Networks: A Broad SurveyNeurocomputing (Neurocomputing), 2022

Chris Rohlfs

OOD AI4CE

279

04 Sep 2022

BinImg2Vec: Augmenting Malware Binary Image Classification with Data2VecInternational Conference on Applied Informatics and Communication (ICAIC), 2022

Joon Sern Lee

Kai Keng Tay

Zong Fu Chua

02 Sep 2022

MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image PretrainingComputer Vision and Pattern Recognition (CVPR), 2022

Jianmin Bao

...

Lu Yuan

281

222

25 Aug 2022

AI and 6G into the Metaverse: Fundamentals, Challenges and Future Research TrendsIEEE Open Journal of the Communications Society (OJ-COMS), 2022

241

117

23 Aug 2022

Estimating a potential without the agony of the partition functionSIAM Journal on Mathematics of Data Science (SIMODS), 2022

E. Haber

Moshe Eliasof

L. Tenorio

272

19 Aug 2022

BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers

405

392

12 Aug 2022

MILAN: Masked Image Pretraining on Language Assisted Representation

302

11 Aug 2022

Understanding Masked Image Modeling via Learning Occlusion Invariant FeatureComputer Vision and Pattern Recognition (CVPR), 2022

Xiangwen Kong

Xiangyu Zhang

SSL

210

08 Aug 2022

SdAE: Self-distillated Masked AutoencoderEuropean Conference on Computer Vision (ECCV), 2022

216

31 Jul 2022

A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond

Kang Zhang

In So Kweon

SSL

234

30 Jul 2022

UAVM: Towards Unifying Audio and Visual ModelsIEEE Signal Processing Letters (SPL), 2022

299

29 Jul 2022

ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production ScaleKnowledge Discovery and Data Mining (KDD), 2022

...

Prahalad Venkataramanan

Zheng Wu

Pankaj Sitpure

CLL

221

19 Jul 2022

Bootstrapped Masked Autoencoders for Vision BERT PretrainingEuropean Conference on Computer Vision (ECCV), 2022

Jianmin Bao

Lu Yuan

218

14 Jul 2022

u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled ModalityNeural Information Processing Systems (NeurIPS), 2022

Wei-Ning Hsu

Bowen Shi

SSL VLM

316

14 Jul 2022

Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech ModelsInterspeech (Interspeech), 2022

180

14 Jul 2022

Masked Autoencoders that ListenNeural Information Processing Systems (NeurIPS), 2022

Po-Yao (Bernie) Huang

Christoph Feichtenhofer

535

387

13 Jul 2022

391

08 Jul 2022

Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASRInterspeech (Interspeech), 2022

Lei Xie

178

03 Jul 2022

$FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopy$

FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopyScientific Data (Sci Data), 2022

388

01 Jul 2022

Analysis of Self-Supervised Learning and Dimensionality Reduction Methods in Clustering-Based Active Learning for Speech Emotion RecognitionInterspeech (Interspeech), 2022

Einari Vaaras

Manu Airaksinen

Okko Räsänen

127

21 Jun 2022

Supervision-Guided Codebooks for Masked Prediction in Speech Pre-trainingInterspeech (Interspeech), 2022

206

21 Jun 2022

EATFormer: Improving Vision Transformer Inspired by Evolutionary AlgorithmInternational Journal of Computer Vision (IJCV), 2022

Jiangning Zhang

Xiangtai Li

Yabiao Wang

Chengjie Wang

304

19 Jun 2022

OmniMAE: Single Model Masked Pretraining on Images and VideosComputer Vision and Pattern Recognition (CVPR), 2022

Rohit Girdhar

Alaaeldin El-Nouby

Mannat Singh

Kalyan Vasudev Alwala

Armand Joulin

Ishan Misra

ViT

268

118

16 Jun 2022

Masked Frequency Modeling for Self-Supervised Visual Pre-TrainingInternational Conference on Learning Representations (ICLR), 2022

Xiaohang Zhan

246

100

15 Jun 2022

Masked Siamese ConvNets

208

15 Jun 2022

Language Models are General-Purpose Interfaces

216

110

13 Jun 2022

Extreme Masking for Learning Instance and Distributed Visual Representations

296

09 Jun 2022

Words are all you need? Language as an approximation for human similarity judgmentsInternational Conference on Learning Representations (ICLR), 2022

262

08 Jun 2022

Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream TasksInternational Conference on Learning Representations (ICLR), 2022

349

08 Jun 2022

Masked Unsupervised Self-training for Label-free Image ClassificationInternational Conference on Learning Representations (ICLR), 2022

Junnan Li

Silvio Savarese

Steven C. H. Hoi

VLM SSL

148

07 Jun 2022