v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

International Conference on Machine Learning (ICML), 2022

7 February 2022

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 609 papers shown

FastMIM: Expediting Masked Image Modeling Pre-training for Vision

198

13 Dec 2022

Jointly Learning Visual and Auditory Speech Representations from Raw DataInternational Conference on Learning Representations (ICLR), 2022

309

12 Dec 2022

CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet

Jianmin Bao

Lu Yuan

171

12 Dec 2022

TriNet: stabilizing self-supervised learning from complete or slow collapse on ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

144

12 Dec 2022

Deep Architectures for Content Moderation and Movie Content Rating

Fatih Çagatay Akyön

A. Temi̇zel

214

08 Dec 2022

Group Generalized Mean Pooling for Vision Transformer

303

08 Dec 2022

Improved Speech Pre-Training with Supervision-Enhanced Acoustic Unit

Jianqing Gao

190

07 Dec 2022

Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information

290

07 Dec 2022

Self-Supervised Audio-Visual Speech Representations Learning By Multimodal Self-DistillationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Jianqing Gao

242

06 Dec 2022

Location-Aware Self-Supervised Transformers for Semantic SegmentationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022

330

05 Dec 2022

MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning

Ge Zhang

...

Ruibo Liu

184

05 Dec 2022

Exploring Stochastic Autoregressive Image Modeling for Visual RepresentationAAAI Conference on Artificial Intelligence (AAAI), 2022

Fan Yang

114

03 Dec 2022

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech RecognitionInterspeech (Interspeech), 2022

Xiaohuan Zhou

Jiaming Wang

Zeyu Cui

Shiliang Zhang

Zhijie Yan

Jingren Zhou

Chang Zhou

265

29 Nov 2022

XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation LearningAAAI Conference on Artificial Intelligence (AAAI), 2022

Pritam Sarkar

Ali Etemad

388

25 Nov 2022

TESSP: Text-Enhanced Self-Supervised Speech Pre-training

212

24 Nov 2022

Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token MigrationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

Yaowei Wang

205

23 Nov 2022

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation LearningIEEE transactions on multimedia (IEEE TMM), 2022

274

21 Nov 2022

CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical FlowIEEE International Conference on Computer Vision (ICCV), 2022

498

160

18 Nov 2022

Towards All-in-one Pre-training via Maximizing Multi-modal Mutual InformationComputer Vision and Pattern Recognition (CVPR), 2022

Weijie Su

Gao Huang

Yu Qiao

Xiaogang Wang

Jie Zhou

Jifeng Dai

245

17 Nov 2022

CAE v2: Context Autoencoder with CLIP Target

...

Errui Ding

Jingdong Wang

VLM CLIP

276

17 Nov 2022

Assessing Neural Network Robustness via Adversarial Pivotal TuningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022

Peter Ebert Christensen

Vésteinn Snaebjarnarson

Andrea Dittadi

Serge Belongie

Sagie Benaim

AAML

228

17 Nov 2022

Prompt Tuning for Parameter-efficient Medical Image Segmentation

179

16 Nov 2022

Stare at What You See: Masked Image Modeling without ReconstructionComputer Vision and Pattern Recognition (CVPR), 2022

Yu Qiao

183

16 Nov 2022

Improving Speech Emotion Recognition with Unsupervised Speaking Style TransferIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

223

16 Nov 2022

EVA: Exploring the Limits of Masked Visual Representation Learning at ScaleComputer Vision and Pattern Recognition (CVPR), 2022

621

901

14 Nov 2022

MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple TargetsInterspeech (Interspeech), 2022

Xie Chen

322

14 Nov 2022

SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation

Yi Wang

Nassim Ait Ali Braham

Zhitong Xiong

Chenying Liu

C. Albrecht

Xiao Xiang Zhu

233

13 Nov 2022

MARLIN: Masked Autoencoder for facial video Representation LearnINgComputer Vision and Pattern Recognition (CVPR), 2022

Zhixi Cai

Shreya Ghosh

Kalin Stefanov

Abhinav Dhall

Jianfei Cai

248

12 Nov 2022

Okapi: Generalising Better by Making Statistical Matches MatchNeural Information Processing Systems (NeurIPS), 2022

193

07 Nov 2022

Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech ProcessingNeural Information Processing Systems (NeurIPS), 2022

Kaizhi Qian

378

02 Nov 2022

data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setupIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

158

02 Nov 2022

Deep Multimodal Fusion for Generalizable Person Re-identification

308

02 Nov 2022

More Speaking or More Speakers?IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

224

02 Nov 2022

Self-Supervised Learning with Limited Labeled Data for Prostate Cancer Detection in High Frequency UltrasoundIEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control (IEEE TUFFC), 2022

Purang Abolmaesumi

145

01 Nov 2022

Speech-text based multi-modal training with bidirectional attention for improved speech recognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Sheng Li

187

01 Nov 2022

Training Vision-Language Models with Less Bimodal SupervisionConference on Automated Knowledge Base Construction (AKBC), 2022

125

01 Nov 2022

token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and TextIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Xianghu Yue

Junyi Ao

Xiaoxue Gao

Haizhou Li

SSL

203

30 Oct 2022

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech RecognitionAutomatic Speech Recognition & Understanding (ASRU), 2022

Xie Chen

254

27 Oct 2022

Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Yu-Chen Hu

182

27 Oct 2022

Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the InputIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

398

26 Oct 2022

AVES: Animal Vocalization Encoder based on Self-SupervisionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Masato Hagiwara

CLIP VLM AI4TS

178

26 Oct 2022

Learning Explicit Object-Centric Representations with Vision Transformers

Oscar Vikström

Alexander Ilin

OCL ViT

215

25 Oct 2022

Adversarial Pretraining of Self-Supervised Deep Networks: Past, Present and Future

Guo-Jun Qi

M. Shah

SSL

156

23 Oct 2022

Evidence of Vocal Tract Articulation in Self-Supervised Learning of SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Cheol Jun Cho

Peter Wu

Abdel-rahman Mohamed

Gopala K. Anumanchipalli

204

21 Oct 2022

Towards Sustainable Self-supervised Learning

354

20 Oct 2022

CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View CompletionNeural Information Processing Systems (NeurIPS), 2022

373

127

19 Oct 2022

A Unified View of Masked Image Modeling

242

19 Oct 2022

Continuous Pseudo-Labeling from the StartInternational Conference on Learning Representations (ICLR), 2022

228

17 Oct 2022

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation LearningSpoken Language Technology Workshop (SLT), 2022

Tzu-Quan Lin

...

255

16 Oct 2022

Improving generalizability of distilled self-supervised speech processing models under distorted settingsSpoken Language Technology Workshop (SLT), 2022

Kuan-Po Huang

Yu-Kuan Fu

Tsung-Yuan Hsu

Fabian Ritter-Gutierrez

254

14 Oct 2022