ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.03555
  4. Cited By
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

International Conference on Machine Learning (ICML), 2022
7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
    SSLVLMViT
ArXiv (abs)PDFHTML

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 609 papers shown
Point2Vec for Self-Supervised Representation Learning on Point Clouds
Point2Vec for Self-Supervised Representation Learning on Point Clouds
Karim Abou Zeid
Jonas Schult
Alexander Hermans
Bastian Leibe
3DPC
207
44
0
29 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Unmasked Teacher: Towards Training-Efficient Video Foundation ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
536
238
0
28 Mar 2023
On the Stepwise Nature of Self-Supervised Learning
On the Stepwise Nature of Self-Supervised LearningInternational Conference on Machine Learning (ICML), 2023
James B. Simon
Maksis Knutins
Liu Ziyin
Daniel Geisz
Abraham J. Fetterman
Joshua Albrecht
SSL
301
41
0
27 Mar 2023
Decoupled Multimodal Distilling for Emotion Recognition
Decoupled Multimodal Distilling for Emotion RecognitionComputer Vision and Pattern Recognition (CVPR), 2023
Yong Li
Yuan-Zheng Wang
Zhen Cui
185
167
0
24 Mar 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
463
70
0
21 Mar 2023
GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling
  for Multi-view 3D Understanding
GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D UnderstandingIEEE International Conference on Computer Vision (ICCV), 2023
Jihao Liu
Tai Wang
Boxiao Liu
Qihang Zhang
Yu Liu
Jiaming Song
247
22
0
20 Mar 2023
Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture
  and Single-Source Speech
Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Maryam Fazel-Zarandi
Wei-Ning Hsu
SSL
142
13
0
20 Mar 2023
Right the docs: Characterising voice dataset documentation practices
  used in machine learning
Right the docs: Characterising voice dataset documentation practices used in machine learningAustralasian Language Technology Association Workshop (ALTA), 2023
Kathy Reid
Elizabeth T. Williams
178
2
0
19 Mar 2023
OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav
OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav
Karmesh Yadav
Arjun Majumdar
Ram Ramrakhya
Naoki Yokoyama
Alexei Baevski
Z. Kira
Oleksandr Maksymets
Dhruv Batra
ViT
321
74
0
14 Mar 2023
AdPE: Adversarial Positional Embeddings for Pretraining Vision
  Transformers via MAE+
AdPE: Adversarial Positional Embeddings for Pretraining Vision Transformers via MAE+
Tianlin Li
Ying Wang
Ziwei Xuan
Guo-Jun Qi
ViT
178
4
0
14 Mar 2023
CrossFormer++: A Versatile Vision Transformer Hinging on Cross-scale
  Attention
CrossFormer++: A Versatile Vision Transformer Hinging on Cross-scale AttentionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Wenxiao Wang
Wei Chen
Qibo Qiu
Long Chen
Boxi Wu
Binbin Lin
Xiaofei He
Wei Liu
231
96
0
13 Mar 2023
Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature
  Mimicking
Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature MimickingInternational Journal of Computer Vision (IJCV), 2023
Shiyang Feng
Renrui Zhang
Rongyao Fang
Ziyi Lin
Hongyang Li
Jiaming Song
Qiao Yu
180
25
0
09 Mar 2023
Improving Few-Shot Learning for Talking Face System with TTS Data
  Augmentation
Improving Few-Shot Learning for Talking Face System with TTS Data AugmentationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Qi Chen
Ziyang Ma
Tao Liu
Xuejiao Tan
Qu Lu
Xie Chen
K. Yu
CVBM
158
6
0
09 Mar 2023
Masked Image Modeling with Local Multi-Scale Reconstruction
Masked Image Modeling with Local Multi-Scale ReconstructionComputer Vision and Pattern Recognition (CVPR), 2023
Haoqing Wang
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhiwei Deng
Kai Han
205
68
0
09 Mar 2023
Centroid-centered Modeling for Efficient Vision Transformer Pre-training
Centroid-centered Modeling for Efficient Vision Transformer Pre-trainingChinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023
Xin Yan
Zuchao Li
Lefei Zhang
Bo Du
Dacheng Tao
VLM
146
1
0
08 Mar 2023
Self-supervised speech representation learning for keyword-spotting with
  light-weight transformers
Self-supervised speech representation learning for keyword-spotting with light-weight transformersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Chenyang Gao
Yue Gu
Francesco Calivá
Yuzong Liu
OffRL
173
6
0
07 Mar 2023
Applying Plain Transformers to Real-World Point Clouds
Applying Plain Transformers to Real-World Point Clouds
Lanxiao Li
M. Heizmann
3DPCViT
370
3
0
28 Feb 2023
Generic-to-Specific Distillation of Masked Autoencoders
Generic-to-Specific Distillation of Masked AutoencodersComputer Vision and Pattern Recognition (CVPR), 2023
Wei Huang
Zhiliang Peng
Li Dong
Furu Wei
Jianbin Jiao
QiXiang Ye
282
30
0
28 Feb 2023
Efficient Masked Autoencoders with Self-Consistency
Efficient Masked Autoencoders with Self-ConsistencyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Zhaowen Li
Yousong Zhu
Zhiyang Chen
Wei Li
Honghui Dong
Rui Zhao
Ming Tang
Jinqiao Wang
267
3
0
28 Feb 2023
Phone and speaker spatial organization in self-supervised speech
  representations
Phone and speaker spatial organization in self-supervised speech representations
Pablo Riera
M. Cerdeiro
L. Pepino
Luciana Ferrer
SSL
237
3
0
24 Feb 2023
Front-End Adapter: Adapting Front-End Input of Speech based
  Self-Supervised Learning for Speech Recognition
Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xie Chen
Ziyang Ma
Changli Tang
Yujin Wang
Zhi-shen Zheng
162
4
0
18 Feb 2023
Gaussian-smoothed Imbalance Data Improves Speech Emotion Recognition
Gaussian-smoothed Imbalance Data Improves Speech Emotion Recognition
Xuefeng Liang
Hexin Jiang
Wenxin Xu
Ying Zhou
170
3
0
17 Feb 2023
A Comprehensive Review and a Taxonomy of Edge Machine Learning:
  Requirements, Paradigms, and Techniques
A Comprehensive Review and a Taxonomy of Edge Machine Learning: Requirements, Paradigms, and TechniquesApplied Informatics (AI), 2023
Wenbin Li
Hakim Hacid
Ebtesam Almazrouei
Merouane Debbah
333
20
0
16 Feb 2023
Speech Enhancement with Multi-granularity Vector Quantization
Speech Enhancement with Multi-granularity Vector QuantizationAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2023
Xiaokang Zhao
Qiu-shi Zhu
Jie Zhang
163
0
0
16 Feb 2023
Multi-modal Machine Learning in Engineering Design: A Review and Future
  Directions
Multi-modal Machine Learning in Engineering Design: A Review and Future DirectionsJournal of Computing and Information Science in Engineering (JCISE), 2023
Binyang Song
Ruilin Zhou
Faez Ahmed
AI4CE
356
65
0
14 Feb 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech
  Representations with Contextualized Target Representations
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target RepresentationsAutomatic Speech Recognition & Understanding (ASRU), 2023
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
SSL
397
43
0
10 Feb 2023
Representation Deficiency in Masked Language Modeling
Representation Deficiency in Masked Language ModelingInternational Conference on Learning Representations (ICLR), 2023
Yu Meng
Jitin Krishnan
Sinong Wang
Qifan Wang
Yuning Mao
Han Fang
Marjan Ghazvininejad
Jiawei Han
Luke Zettlemoyer
229
9
0
04 Feb 2023
ANTM: An Aligned Neural Topic Model for Exploring Evolving Topics
ANTM: An Aligned Neural Topic Model for Exploring Evolving Topics
Hamed Rahimi
Hubert Naacke
Camélia Constantin
B. Amann
BDLAI4TS
354
7
0
03 Feb 2023
SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling
SimMTM: A Simple Pre-Training Framework for Masked Time-Series ModelingNeural Information Processing Systems (NeurIPS), 2023
Jiaxiang Dong
Haixu Wu
Haoran Zhang
Li Zhang
Jianmin Wang
Mingsheng Long
AI4TS
485
143
0
02 Feb 2023
Image-Based Vehicle Classification by Synergizing Features from
  Supervised and Self-Supervised Learning Paradigms
Image-Based Vehicle Classification by Synergizing Features from Supervised and Self-Supervised Learning Paradigms
S. Ma
Jidong J. Yang
SSL
59
6
0
01 Feb 2023
Multimodality Representation Learning: A Survey on Evolution,
  Pretraining and Its Applications
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Muhammad Arslan Manzoor
S. Albarri
Ziting Xian
Zaiqiao Meng
Preslav Nakov
Shangsong Liang
AI4TS
342
53
0
01 Feb 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion
  Models
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion ModelsInternational Conference on Machine Learning (ICML), 2023
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
405
432
0
30 Jan 2023
Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
Aerial Image Object Detection With Vision Transformer Detector (ViTDet)IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2023
Liya Wang
A. Tien
414
20
0
28 Jan 2023
Open Problems in Applied Deep Learning
Open Problems in Applied Deep Learning
M. Raissi
AI4CE
234
3
0
26 Jan 2023
Self-Supervised Learning from Images with a Joint-Embedding Predictive
  Architecture
Self-Supervised Learning from Images with a Joint-Embedding Predictive ArchitectureComputer Vision and Pattern Recognition (CVPR), 2023
Mahmoud Assran
Quentin Duval
Ishan Misra
Piotr Bojanowski
Pascal Vincent
Michael G. Rabbat
Yann LeCun
Nicolas Ballas
SSLAI4TSMDE
471
596
0
19 Jan 2023
Vision Learners Meet Web Image-Text Pairs
Vision Learners Meet Web Image-Text Pairs
Bingchen Zhao
Quan Cui
Hao Wu
Osamu Yoshie
Cheng Yang
Oisin Mac Aodha
VLM
203
6
0
17 Jan 2023
RILS: Masked Visual Reconstruction in Language Semantic Space
RILS: Masked Visual Reconstruction in Language Semantic SpaceComputer Vision and Pattern Recognition (CVPR), 2023
Shusheng Yang
Yixiao Ge
Kun Yi
Dian Li
Ying Shan
Xiaohu Qie
Xinggang Wang
CLIP
194
14
0
17 Jan 2023
A Survey on Self-supervised Learning: Algorithms, Applications, and
  Future Trends
A Survey on Self-supervised Learning: Algorithms, Applications, and Future TrendsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jie Gui
Tuo Chen
Jing Zhang
Qiong Cao
Zhe Sun
Haoran Luo
Dacheng Tao
579
366
0
13 Jan 2023
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
All in Tokens: Unifying Output Space of Visual Tasks via Soft TokenIEEE International Conference on Computer Vision (ICCV), 2023
Jia Ning
Chen Li
Zheng Zhang
Zigang Geng
Jingdong Sun
Kun He
Han Hu
331
60
0
05 Jan 2023
Trace Encoding in Process Mining: a survey and benchmarking
Trace Encoding in Process Mining: a survey and benchmarkingEngineering applications of artificial intelligence (Eng. Appl. Artif. Intell.), 2023
Sylvio Barbon Junior
Paolo Ceravolo
R. Oyamada
G. Tavares
AI4TS
250
31
0
05 Jan 2023
TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models
TinyMIM: An Empirical Study of Distilling MIM Pre-trained ModelsComputer Vision and Pattern Recognition (CVPR), 2023
Sucheng Ren
Fangyun Wei
Zheng Zhang
Han Hu
321
52
0
03 Jan 2023
Disjoint Masking with Joint Distillation for Efficient Masked Image
  Modeling
Disjoint Masking with Joint Distillation for Efficient Masked Image ModelingIEEE transactions on multimedia (IEEE TMM), 2022
Xin Ma
Yu Xie
Chunyu Xie
Long Ye
Yafeng Deng
Xiang Ji
351
16
0
31 Dec 2022
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
  Tasks
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Suwon Shon
Siddhant Arora
Chyi-Jiunn Lin
Ankita Pasad
Felix Wu
Roshan S. Sharma
Wei Wu
Hung-yi Lee
Karen Livescu
Shinji Watanabe
ELM
274
44
0
20 Dec 2022
Exploring Effective Fusion Algorithms for Speech Based Self-Supervised
  Learning Models
Exploring Effective Fusion Algorithms for Speech Based Self-Supervised Learning Models
Changli Tang
Yujin Wang
Xie Chen
Weiqiang Zhang
125
3
0
20 Dec 2022
Randomized Quantization: A Generic Augmentation for Data Agnostic
  Self-supervised Learning
Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised LearningIEEE International Conference on Computer Vision (ICCV), 2022
Huimin Wu
Chenyang Lei
Xiao Sun
Pengju Wang
Qifeng Chen
Kwang-Ting Cheng
Stephen Lin
Zhirong Wu
MQ
277
9
0
19 Dec 2022
BEATs: Audio Pre-Training with Acoustic Tokenizers
BEATs: Audio Pre-Training with Acoustic TokenizersInternational Conference on Machine Learning (ICML), 2022
Sanyuan Chen
Yu-Huan Wu
Chengyi Wang
Shujie Liu
Daniel C. Tompkins
Zhuo Chen
Furu Wei
400
483
0
18 Dec 2022
MAViL: Masked Audio-Video Learners
MAViL: Masked Audio-Video LearnersNeural Information Processing Systems (NeurIPS), 2022
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
337
74
0
15 Dec 2022
Efficient Self-supervised Learning with Contextualized Target
  Representations for Vision, Speech and Language
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and LanguageInternational Conference on Machine Learning (ICML), 2022
Alexei Baevski
Arun Babu
Wei-Ning Hsu
Michael Auli
VLMSSL
364
123
0
14 Dec 2022
Disentangling Prosody Representations with Unsupervised Speech
  Reconstruction
Disentangling Prosody Representations with Unsupervised Speech ReconstructionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Leyuan Qu
Taiha Li
C. Weber
Theresa Pekarek-Rosin
F. Ren
S. Wermter
242
17
0
14 Dec 2022
Learning 3D Representations from 2D Pre-trained Models via
  Image-to-Point Masked Autoencoders
Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked AutoencodersComputer Vision and Pattern Recognition (CVPR), 2022
Renrui Zhang
Liuhui Wang
Yu Qiao
Shiyang Feng
Jiaming Song
3DPC
288
184
0
13 Dec 2022
Previous
123...1011121389
Next
Page 9 of 13
Pageof 13