Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2202.03555
Cited By
v1
v2
v3 (latest)
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
International Conference on Machine Learning (ICML), 2022
7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"
50 / 609 papers shown
Title
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024
Pengcheng Guo
Xuankai Chang
Hang Lv
Shinji Watanabe
Lei Xie
269
5
0
07 Dec 2024
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
Yueru Jia
Jiaming Liu
Sixiang Chen
Chenyang Gu
Zihan Wang
...
Lily Lee
Pengwei Wang
Zhongyuan Wang
Renrui Zhang
Shanghang Zhang
354
39
0
27 Nov 2024
Image Generation Diversity Issues and How to Tame Them
Computer Vision and Pattern Recognition (CVPR), 2024
Mischa Dombrowski
Weitong Zhang
Sarah Cechnicka
Hadrien Reynaud
Bernhard Kainz
322
11
0
25 Nov 2024
Everything is a Video: Unifying Modalities through Next-Frame Prediction
G. Hudson
Dean L. Slack
T. Winterbottom
Jamie Sterling
Chenghao Xiao
Junjie Shentu
Noura Al Moubayed
265
2
0
15 Nov 2024
ParaLBench: A Large-Scale Benchmark for Computational Paralinguistics over Acoustic Foundation Models
IEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2024
Zixing Zhang
Weixiang Xu
Zhongren Dong
Kanglin Wang
Yimeng Wu
Jing Peng
Runming Wang
Dong-Yan Huang
77
7
0
14 Nov 2024
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2024
Wupeng Wang
Zexu Pan
Xianrui Li
Shuai Wang
Haoyang Li
266
11
0
05 Nov 2024
Music Foundation Model as Generic Booster for Music Downstream Tasks
Weihsiang Liao
Yuhta Takida
Yukara Ikemiya
Zhi-Wei Zhong
Chieh-Hsin Lai
...
Stefan Uhlich
Taketo Akama
Woosung Choi
Yuichiro Koyama
Yuki Mitsufuji
512
6
0
02 Nov 2024
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
Heng-Jui Chang
Hongyu Gong
Changhan Wang
James R. Glass
Yu-An Chung
297
5
0
31 Oct 2024
Sparsh: Self-supervised touch representations for vision-based tactile sensing
Conference on Robot Learning (CoRL), 2024
Carolina Higuera
Akash Sharma
Chaithanya Krishna Bodduluri
Taosha Fan
Patrick E. Lancaster
...
Michael Kaess
Byron Boots
Mike Lambeta
Tingfan Wu
Mustafa Mukadam
242
47
0
31 Oct 2024
Enhancing TTS Stability in Hebrew using Discrete Semantic Units
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Ella Zeldes
Or Tal
Yossi Adi
155
3
0
28 Oct 2024
Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Neural Information Processing Systems (NeurIPS), 2024
Shentong Mo
Shengbang Tong
309
6
0
25 Oct 2024
AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech Recognition using Agnostic Contrastive Mixup
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Carlos Carvalho
A. Abad
200
1
0
18 Oct 2024
Self-supervised contrastive learning performs non-linear system identification
International Conference on Learning Representations (ICLR), 2024
Rodrigo González Laiz
Tobias Schmidt
Steffen Schneider
SSL
271
3
0
18 Oct 2024
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Ashish Seth
Ramaneswaran Selvakumar
S. Sakshi
Sonal Kumar
Sreyan Ghosh
Dinesh Manocha
237
4
0
17 Oct 2024
Investigation of Speaker Representation for Target-Speaker Speech Processing
Spoken Language Technology Workshop (SLT), 2024
Takanori Ashihara
Takafumi Moriya
Shota Horiguchi
Junyi Peng
Tsubasa Ochiai
Marc Delcroix
Kohei Matsuura
Hiroshi Sato
210
2
0
15 Oct 2024
JOOCI: a Framework for Learning Comprehensive Speech Representations
Hemant Yadav
R. Shah
Sunayana Sitaram
299
0
0
14 Oct 2024
Adaptive Diffusion Terrain Generator for Autonomous Uneven Terrain Navigation
Conference on Robot Learning (CoRL), 2024
Youwei Yu
Junhong Xu
Lantao Liu
151
0
0
14 Oct 2024
Locality Alignment Improves Vision-Language Models
International Conference on Learning Representations (ICLR), 2024
Ian Covert
Tony Sun
James Zou
Tatsunori Hashimoto
VLM
573
11
0
14 Oct 2024
Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive Architecture
Sehun Kim
170
7
0
11 Oct 2024
Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Yi Zhu
C. Goel
Surya Koppisetti
Trang Tran
Ankur Kumar
Gaurav Bharaj
AAML
158
2
0
09 Oct 2024
Forte : Finding Outliers with Representation Typicality Estimation
International Conference on Learning Representations (ICLR), 2024
Debargha Ganguly
Warren Morningstar
A. Yu
Vipin Chaudhary
OODD
236
4
0
02 Oct 2024
Denoising with a Joint-Embedding Predictive Architecture
International Conference on Learning Representations (ICLR), 2024
Dengsheng Chen
Jie Hu
Xiaoming Wei
Enhua Wu
DiffM
461
5
0
02 Oct 2024
You Only Speak Once to See
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Wenhao Yang
Jianguo Wei
Wenhuan Lu
Lei Li
VOS
190
4
0
27 Sep 2024
Adaptive Self-Supervised Learning Strategies for Dynamic On-Device LLM Personalization
Rafael Mendoza
Isabella Cruz
Richard Liu
Aarav Deshmukh
David Williams
Jesscia Peng
Rohan Iyer
305
3
0
25 Sep 2024
Point-PNG: Conditional Pseudo-Negatives Generation for Point Cloud Pre-Training
Sutharsan Mahendren
Saimunur Rahman
Piotr Koniusz
Tharindu Fernando
Sridha Sridharan
Clinton Fookes
Peyman Moghadam
3DPC
325
0
0
24 Sep 2024
CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for SSL-Based Speaker Verification
Junyi Peng
Ladislav Mošner
Lin Zhang
Oldrich Plchot
Themos Stafylakis
Lukáš Burget
Jan Černocký
157
4
0
23 Sep 2024
The ParlaSpeech Collection of Automatically Generated Speech and Text Datasets from Parliamentary Proceedings
International Conference on Speech and Computer (SPECOM), 2024
Nikola Ljubesic
Peter Rupnik
Danijel Koržinek
212
5
0
23 Sep 2024
Is Tokenization Needed for Masked Particle Modelling?
Matthew Leigh
Samuel Klein
François Charton
Tobias Golling
Lukas Heinrich
Michael Kagan
Ines Ochoa
Margarita Osadchy
217
16
0
19 Sep 2024
Self-supervised Speech Models for Word-Level Stuttered Speech Detection
Spoken Language Technology Workshop (SLT), 2024
Yi-Jen Shih
Zoi Gkalitsiou
A. Dimakis
David Harwath
231
6
0
16 Sep 2024
Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System Performance
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Huang-Cheng Chou
Haibin Wu
Hung-yi Lee
Chi-Chun Lee
358
3
0
16 Sep 2024
NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training
Minglun Han
Ye Bai
Chen Shen
Youjia Huang
Mingkun Huang
Zehua Lin
Linhao Dong
Lu Lu
Yuxuan Wang
209
2
0
13 Sep 2024
Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networks
Teresa Dorszewski
Lenka Tětková
Lorenz Linhardt
Lars Kai Hansen
HAI
212
1
0
10 Sep 2024
A Survey of the Self Supervised Learning Mechanisms for Vision Transformers
Asifullah Khan
A. Sohail
Mustansar Fiaz
Mehdi Hassan
Tariq Habib Afridi
...
Muhammad Zaigham Zaheer
Kamran Ali
Tangina Sultana
Ziaurrehman Tanoli
Naeem Akhter
911
12
0
30 Aug 2024
SSDM: Scalable Speech Dysfluency Modeling
Neural Information Processing Systems (NeurIPS), 2024
Jiachen Lian
Xuanru Zhou
Z. Ezzes
Jet M J Vonk
Brittany Morin
D. Baquirin
Zachary Mille
M. G. Tempini
Gopala Anumanchipalli
AuLLM
253
19
0
29 Aug 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
International Conference on Learning Representations (ICLR), 2024
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
360
118
0
29 Aug 2024
GSIFN: A Graph-Structured and Interlaced-Masked Multimodal Transformer-based Fusion Network for Multimodal Sentiment Analysis
Yijie Jin
182
3
0
27 Aug 2024
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
He Huang
Taejin Park
Kunal Dhawan
Ivan Medennikov
Krishna Puvvada
Nithin Rao Koluguri
Weiqing Wang
Jagadeesh Balam
Boris Ginsburg
SSL
AI4TS
281
4
0
23 Aug 2024
BUT Systems and Analyses for the ASVspoof 5 Challenge
Johan Rohdin
Lin Zhang
Oldřich Plchot
Vojtěch Staněk
David Mihola
...
Themos Stafylakis
Dmitriy Beveraki
Anna Silnova
Jan Brukner
Lukáš Burget
164
8
0
20 Aug 2024
mRNA2vec: mRNA Embedding with Language Model in the 5ÚTR-CDS for mRNA Design
AAAI Conference on Artificial Intelligence (AAAI), 2024
Honggen Zhang
Xiangrui Gao
Igor Molybog
Lipeng Lai
178
4
0
16 Aug 2024
SpectralEarth: Training Hyperspectral Foundation Models at Scale
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE JSTARS), 2024
Nassim Ait Ali Braham
C. Albrecht
Julien Mairal
J. Chanussot
Yi Wang
X. Zhu
293
30
0
15 Aug 2024
Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation
International Society for Music Information Retrieval Conference (ISMIR), 2024
Alain Riou
Stefan Lattner
Gaëtan Hadjeres
Michael Anslow
Geoffroy Peeters
233
6
0
05 Aug 2024
Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent
Shanbo Cheng
Zhichao Huang
Tom Ko
Hang Li
Ningxin Peng
Lu Xu
Qini Zhang
284
11
0
31 Jul 2024
Beyond Silent Letters: Amplifying LLMs in Emotion Recognition with Vocal Nuances
Mieko Ochi
Ziwei Gong
D. Komura
Pengyuan Shi
Kaan Donbekci
Julia Hirschberg
352
37
0
31 Jul 2024
SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection
Yi Zhu
Surya Koppisetti
Trang Tran
Gaurav Bharaj
365
22
0
26 Jul 2024
Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning
Yibing Wei
Abhinav Gupta
Pedro Morgado
SSL
182
14
0
22 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
329
23
0
21 Jul 2024
Linear-Complexity Self-Supervised Learning for Speech Processing
Shucong Zhang
Titouan Parcollet
Rogier van Dalen
Sourav Bhattacharya
267
1
0
18 Jul 2024
ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders
Carlos Hinojosa
Shuming Liu
Guohao Li
221
8
0
17 Jul 2024
A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification
Markus Marks
Manuel Knott
Neehar Kondapaneni
Elijah Cole
T. Defraeye
Fernando Pérez-Cruz
Pietro Perona
SSL
386
14
0
16 Jul 2024
Efficient Unsupervised Visual Representation Learning with Explicit Cluster Balancing
Ioannis Maniadis Metaxas
Georgios Tzimiropoulos
Ioannis Patras
SSL
268
2
0
15 Jul 2024
Previous
1
2
3
4
5
6
...
11
12
13
Next