Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2202.03555
Cited By
v1
v2
v3 (latest)
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
International Conference on Machine Learning (ICML), 2022
7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"
50 / 609 papers shown
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
HyoJung Han
Mohamed Anwar
J. Pino
Wei-Ning Hsu
Marine Carpuat
Bowen Shi
Changhan Wang
VLM
246
15
0
21 Mar 2024
Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding
Jingjing Hu
Dan Guo
Kun Li
Zhan Si
Xun Yang
Xiaojun Chang
Meng Wang
301
20
0
21 Mar 2024
CORN: Contact-based Object Representation for Nonprehensile Manipulation of General Unseen Objects
Yoonyoung Cho
Junhyek Han
Yoontae Cho
Beomjoon Kim
353
15
0
16 Mar 2024
SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation
Computer Speech and Language (CSL), 2024
Jiayu Du
Jinpeng Li
Guoguo Chen
Wei-Qiang Zhang
ELM
190
4
0
13 Mar 2024
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Jiange Yang
Bei Liu
Jianlong Fu
Bocheng Pan
Gangshan Wu
Limin Wang
360
20
0
08 Mar 2024
IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages
Tahir Javed
J. Nawale
E. George
Sakshi Joshi
Kaushal Bhogale
...
M. ManickamK
C. V. Vaijayanthi
Krishnan Srinivasa Raghavan Karunganni
Pratyush Kumar
Mitesh M Khapra
238
37
0
04 Mar 2024
BootTOD: Bootstrap Task-oriented Dialogue Representations by Aligning Diverse Responses
Weihao Zeng
Keqing He
Yejie Wang
Dayuan Fu
Weiran Xu
195
0
0
02 Mar 2024
Learning and Leveraging World Models in Visual Representation Learning
Q. Garrido
Mahmoud Assran
Nicolas Ballas
Adrien Bardes
Laurent Najman
Yann LeCun
SSL
277
51
0
01 Mar 2024
Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder
Jiaqi Wang
Zhenxi Song
Zhengyu Ma
Xipeng Qiu
Min Zhang
Zhiguo Zhang
405
16
0
27 Feb 2024
Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning
Johnathan Xie
Yoonho Lee
Annie S. Chen
Chelsea Finn
171
4
0
22 Feb 2024
The Effect of Batch Size on Contrastive Self-Supervised Speech Representation Learning
Nik Vaessen
David A. van Leeuwen
300
5
0
21 Feb 2024
EMO-SUPERB: An In-depth Look at Speech Emotion Recognition
Haibin Wu
Huang-Cheng Chou
Kai-Wei Chang
Lucas Goncalves
Jiawei Du
Jyh-Shing Roger Jang
Chi-Chun Lee
Hung-Yi Lee
391
19
0
20 Feb 2024
Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation
Wen Wu
Yue Liu
Chuxu Zhang
Chung-Cheng Chiu
Qiujia Li
Junwen Bai
Tara N. Sainath
P. Woodland
189
4
0
20 Feb 2024
A Comprehensive Review of Machine Learning Advances on Data Change: A Cross-Field Perspective
Jeng-Lin Li
Chih-Fan Hsu
Ming-Ching Chang
Wei-Chao Chen
OOD
244
2
0
20 Feb 2024
Probing Self-supervised Learning Models with Target Speech Extraction
Junyi Peng
Marc Delcroix
Tsubasa Ochiai
Oldrich Plchot
Takanori Ashihara
Shoko Araki
J. Černocký
256
6
0
17 Feb 2024
EEG2Rep: Enhancing Self-supervised EEG Representation Through Informative Masked Inputs
Navid Mohammadi Foumani
G. Mackellar
Soheila Ghane
Saad Irtza
Nam Nguyen
Mahsa Salehi
316
38
0
17 Feb 2024
Revisiting Feature Prediction for Learning Visual Representations from Video
Adrien Bardes
Q. Garrido
Jean Ponce
Xinlei Chen
Michael G. Rabbat
Yann LeCun
Mahmoud Assran
Nicolas Ballas
MDE
VLM
344
173
0
15 Feb 2024
Advancing Human Action Recognition with Foundation Models trained on Unlabeled Public Videos
Yang Qian
Yinan Sun
A. Kargarandehkordi
Parnian Azizian
O. Mutlu
Saimourya Surabhi
Pingyi Chen
Zain Jabbar
Dennis Paul Wall
Peter Washington
OffRL
290
5
0
14 Feb 2024
MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning
Interspeech (Interspeech), 2024
Hang Zhao
Yifei Xin
Zhesong Yu
Bilei Zhu
Lu Lu
Zejun Ma
AuLLM
253
5
0
12 Feb 2024
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
Hsuan-Fu Wang
Yi-Jen Shih
Heng-Jui Chang
Layne Berry
Puyuan Peng
Hung-yi Lee
Hsin-Min Wang
David Harwath
VLM
182
6
0
10 Feb 2024
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR
Liang-Hsuan Tseng
En-Pei Hu
Cheng-Han Chiang
Yuan Tseng
Hung-yi Lee
Lin-shan Lee
Shao-Hua Sun
219
3
0
06 Feb 2024
The last Dance : Robust backdoor attack via diffusion models and bayesian approach
Orson Mengara
DiffM
595
4
0
05 Feb 2024
Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
Haoyi Zhu
Yating Wang
Di Huang
Weicai Ye
Wanli Ouyang
Tong He
SSL
3DPC
342
46
0
04 Feb 2024
TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Taeyang Yun
Hyunkuk Lim
Jeong-Hoon Lee
Min Song
217
27
0
16 Jan 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
IEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2024
Hyoung-Seok Oh
Sang-Hoon Lee
Deok-Hyun Cho
Seong-Whan Lee
573
1
0
16 Jan 2024
MIMIC: Mask Image Pre-training with Mix Contrastive Fine-tuning for Facial Expression Recognition
Fan Zhang
Xiaobao Guo
Xiaojiang Peng
Alex C. Kot
124
2
0
14 Jan 2024
An EcoSage Assistant: Towards Building A Multimodal Plant Care Dialogue Assistant
European Conference on Information Retrieval (ECIR), 2024
Mohit Tomar
Abhisek Tiwari
Tulika Saha
Prince Jha
Sriparna Saha
110
1
0
10 Jan 2024
HiMTM: Hierarchical Multi-Scale Masked Time Series Modeling for Long-Term Forecasting
International Conference on Information and Knowledge Management (CIKM), 2024
Shubao Zhao
Ming Jin
Zhaoxiang Hou
Che-Sheng Yang
Zengxiang Li
Qingsong Wen
Yi Wang
209
2
0
10 Jan 2024
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
Wenxi Chen
Yuzhe Liang
Ziyang Ma
Zhisheng Zheng
Xie Chen
ViT
259
63
0
07 Jan 2024
MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition
Zheng Lian
Guoying Zhao
Yong Ren
Hao Gu
Haiyang Sun
Lan Chen
Yinan Han
Jianhua Tao
402
26
0
07 Jan 2024
CrisisViT: A Robust Vision Transformer for Crisis Image Classification
Zijun Long
R. McCreadie
Muhammad Imran
315
16
0
05 Jan 2024
Towards Weakly Supervised Text-to-Audio Grounding
Xuenan Xu
Ziyang Ma
Mengyue Wu
Kai Yu
AI4TS
349
17
0
05 Jan 2024
Few-shot Adaptation of Multi-modal Foundation Models: A Survey
Artificial Intelligence Review (Artif Intell Rev), 2024
Fan Liu
Tianshu Zhang
Wenwen Dai
Wenwen Cai
Wenwen Cai Xiaocong Zhou
Delong Chen
VLM
OffRL
301
50
0
03 Jan 2024
Skeleton2vec: A Self-supervised Learning Framework with Contextualized Target Representations for Skeleton Sequence
Ruizhuo Xu
Linzhi Huang
Mei Wang
Jiani Hu
Weihong Deng
ViT
MedIm
275
5
0
01 Jan 2024
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
299
28
0
31 Dec 2023
Morphing Tokens Draw Strong Masked Image Models
International Conference on Learning Representations (ICLR), 2023
Taekyung Kim
Byeongho Heo
Dongyoon Han
737
3
0
30 Dec 2023
Learning Vision from Models Rivals Learning Vision from Data
Computer Vision and Pattern Recognition (CVPR), 2023
Yonglong Tian
Lijie Fan
Kaifeng Chen
Dina Katabi
Dilip Krishnan
Phillip Isola
274
73
0
28 Dec 2023
Learning to Embed Time Series Patches Independently
Seunghan Lee
Taeyoung Park
Kibok Lee
SSL
AI4TS
314
52
0
27 Dec 2023
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Ziyang Ma
Zhisheng Zheng
Jiaxin Ye
Jinchao Li
Zhifu Gao
Shiliang Zhang
Xie Chen
MDE
SLR
SSL
301
235
0
23 Dec 2023
Bootstrap Masked Visual Modeling via Hard Patches Mining
Haochen Wang
Junsong Fan
Yuxi Wang
Kaiyou Song
Tiancai Wang
Xiangyu Zhang
Zhaoxiang Zhang
227
6
0
21 Dec 2023
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
Jiaming Liu
Ran Xu
Senqiao Yang
Renrui Zhang
Qizhe Zhang
Zehui Chen
Yandong Guo
Shanghang Zhang
TTA
245
26
0
19 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
250
1
0
18 Dec 2023
Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders
Yaohua Zha
Huizhen Ji
Jinmin Li
Rongsheng Li
Tao Dai
Bin Chen
Zhi Wang
Shu-Tao Xia
3DPC
232
59
0
17 Dec 2023
Audio-visual fine-tuning of audio-only ASR models
Avner May
Dmitriy Serdyuk
Ankit Parag Shah
Otavio Braga
Olivier Siohan
251
6
0
14 Dec 2023
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
266
106
0
11 Dec 2023
Large-scale Training of Foundation Models for Wearable Biosignals
International Conference on Learning Representations (ICLR), 2023
Salar Abbaspourazad
Oussama Elachqar
Andrew C. Miller
S. Emrani
Udhyakumar Nallasamy
Ian Shapiro
244
74
0
08 Dec 2023
Emergence and Function of Abstract Representations in Self-Supervised Transformers
Quentin RV. Ferry
Joshua Ching
Takashi Kawai
234
3
0
08 Dec 2023
LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures
Vimal Thilak
Chen Huang
Omid Saremi
Laurent Dinh
Hanlin Goh
Preetum Nakkiran
Josh Susskind
Etai Littwin
294
21
0
07 Dec 2023
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Computer Vision and Pattern Recognition (CVPR), 2023
Arun V. Reddy
William Paul
Corban Rivera
Ketul Shah
Celso M. de Melo
Rama Chellappa
520
10
0
05 Dec 2023
Rejuvenating image-GPT as Strong Visual Representation Learners
International Conference on Machine Learning (ICML), 2023
Sucheng Ren
Zeyu Wang
Hongru Zhu
Junfei Xiao
Yaoyao Liu
Cihang Xie
VLM
278
11
0
04 Dec 2023
Previous
1
2
3
4
5
6
...
11
12
13
Next