ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.05862
  4. Cited By
wav2vec: Unsupervised Pre-training for Speech Recognition
v1v2v3v4 (latest)

wav2vec: Unsupervised Pre-training for Speech Recognition

11 April 2019
Steffen Schneider
Alexei Baevski
R. Collobert
Michael Auli
    SSL
ArXiv (abs)PDFHTML

Papers citing "wav2vec: Unsupervised Pre-training for Speech Recognition"

50 / 190 papers shown
EmoCAST: Emotional Talking Portrait via Emotive Text Description
EmoCAST: Emotional Talking Portrait via Emotive Text Description
Yiguo Jiang
Xiaodong Cun
Yong Zhang
Yudian Zheng
Fan Tang
Chi-Man Pun
DiffM
132
0
0
24 Dec 2025
Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge
Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge
Nimrod Berman
O. Joglekar
Eitan Kosman
Dotan Di Castro
Omri Azencot
DiffM
221
2
0
23 Oct 2025
Proprioceptive Image: An Image Representation of Proprioceptive Data from Quadruped Robots for Contact Estimation Learning
Proprioceptive Image: An Image Representation of Proprioceptive Data from Quadruped Robots for Contact Estimation Learning
G. Abati
J. C. V. Soares
Giulio Turrisi
Victor Barasuol
Claudio Semini
121
0
0
16 Oct 2025
On the Alignment Between Supervised and Self-Supervised Contrastive Learning
On the Alignment Between Supervised and Self-Supervised Contrastive Learning
Achleshwar Luthra
Priyadarsi Mishra
Tomer Galanti
SSL
171
0
0
09 Oct 2025
SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models
SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models
Cheng-Han Chiang
Xiaofei Wang
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Shujie Liu
Zhendong Wang
Zhengyuan Yang
Hung-yi Lee
Lijuan Wang
LLMAGReLMRALMLRM
184
3
0
08 Oct 2025
AgentDR Dynamic Recommendation with Implicit Item-Item Relations via LLM-based Agents
AgentDR Dynamic Recommendation with Implicit Item-Item Relations via LLM-based Agents
Mingdai Yang
Nurendra Choudhary
Jiangshu Du
Edward W.Huang
Philip S.Yu
Karthik Subbian
Danai Kourta
148
0
0
07 Oct 2025
Audio Driven Real-Time Facial Animation for Social Telepresence
Audio Driven Real-Time Facial Animation for Social Telepresence
Jiye Lee
Chenghui Li
Linh Tran
S. Wei
Jason M. Saragih
Alexander Richard
Hanbyul Joo
Shaojie Bai
VGen
152
0
0
01 Oct 2025
Reference-free automatic speech severity evaluation using acoustic unit language modelling
Reference-free automatic speech severity evaluation using acoustic unit language modelling
B. Halpern
Tomoki Toda
115
2
0
01 Oct 2025
StableDub: Taming Diffusion Prior for Generalized and Efficient Visual Dubbing
StableDub: Taming Diffusion Prior for Generalized and Efficient Visual Dubbing
Liyang Chen
Tianze Zhou
Xu He
Boshi Tang
Zhiyong Wu
Yang Huang
Yang Wu
Zhongqian Sun
Wei Yang
Helen M. Meng
DiffM
202
0
0
26 Sep 2025
KSDiff: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation
KSDiff: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation
Tianle Lyu
Junchuan Zhao
Ye Wang
VGen
122
0
0
24 Sep 2025
Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition
Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition
Niclas Pokel
Pehuén Moure
Roman Boehringer
Shih-Chii Liu
Yingqiang Gao
127
0
0
23 Sep 2025
SONAR: Self-Distilled Continual Pre-training for Domain Adaptive Audio Representation
SONAR: Self-Distilled Continual Pre-training for Domain Adaptive Audio Representation
Xicheng Zhang
Yuan Gao
Wangjin Zhou
Zicheng Yuan
Keisuke Imoto
Tatsuya Kawahara
CLL
113
0
0
19 Sep 2025
Multimodal Learning for Fake News Detection in Short Videos Using Linguistically Verified Data and Heterogeneous Modality Fusion
Multimodal Learning for Fake News Detection in Short Videos Using Linguistically Verified Data and Heterogeneous Modality Fusion
Shanghong Li
Chiam Wen Qi Ruth
Hong Xu
Fang Liu
111
0
0
19 Sep 2025
Speech Language Models for Under-Represented Languages: Insights from Wolof
Speech Language Models for Under-Represented Languages: Insights from Wolof
Yaya Sy
Dioula Doucouré
Christophe Cerisara
Irina Illina
AuLLM
145
0
0
18 Sep 2025
Unified Learnable 2D Convolutional Feature Extraction for ASR
Unified Learnable 2D Convolutional Feature Extraction for ASR
Peter Vieting
Benedikt Hilmes
Ralf Schluter
Hermann Ney
SSL
158
0
0
12 Sep 2025
Contextualized Token Discrimination for Speech Search Query Correction
Contextualized Token Discrimination for Speech Search Query Correction
Junyu Lu
Di Jiang
Mengze Hong
Victor Junqiu Wei
Qintian Guo
Zhiyang Su
113
2
0
04 Sep 2025
Automatic Pronunciation Error Detection and Correction of the Holy Quran's Learners Using Deep Learning
Automatic Pronunciation Error Detection and Correction of the Holy Quran's Learners Using Deep Learning
Abdullah Abdelfattah
M. Khalil
Hazem M. Abbas
120
0
0
27 Aug 2025
Wan-S2V: Audio-Driven Cinematic Video Generation
Wan-S2V: Audio-Driven Cinematic Video Generation
Xin Gao
Li Hu
Siqi Hu
Mingyang Huang
Chaonan Ji
...
Peng Zhang
Xindi Zhang
Zhe Zhang
Jingren Zhou
Lian Zhuo
DiffMVGen
142
20
0
26 Aug 2025
Amplifying Emotional Signals: Data-Efficient Deep Learning for Robust Speech Emotion Recognition
Amplifying Emotional Signals: Data-Efficient Deep Learning for Robust Speech Emotion Recognition
Tai Vu
176
0
0
26 Aug 2025
Whisper based Cross-Lingual Phoneme Recognition between Vietnamese and English
Whisper based Cross-Lingual Phoneme Recognition between Vietnamese and English
Nguyen Huu Nhat Minh
Tran Nguyen Anh
Truong Dinh Dung
Vo Van Nam
Le Pham Tuyen
89
1
0
22 Aug 2025
Foundation Models for Cross-Domain EEG Analysis Application: A Survey
Foundation Models for Cross-Domain EEG Analysis Application: A Survey
Hongqi Li
Yitong Chen
Yujuan Wang
Weihang Ni
Haodong Zhang
196
2
0
21 Aug 2025
CUPE: Contextless Universal Phoneme Encoder for Language-Agnostic Speech Processing
CUPE: Contextless Universal Phoneme Encoder for Language-Agnostic Speech Processing
Abdul Rehman
Jian-Jun Zhang
Xiaosong Yang
130
1
0
21 Aug 2025
EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition
EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition
Hugo Thimonier
Antony Perzo
Renaud Seguier
145
1
0
19 Aug 2025
InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing
InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing
Shaoshu Yang
Zhe Kong
Feng Gao
Meng Cheng
Xiangyu Liu
...
Zhuoliang Kang
Tong Lu
Xunliang Cai
Ran He
Xiaoming Wei
VGen
127
10
0
19 Aug 2025
HuBERT-VIC: Improving Noise-Robust Automatic Speech Recognition of Speech Foundation Model via Variance-Invariance-Covariance Regularization
HuBERT-VIC: Improving Noise-Robust Automatic Speech Recognition of Speech Foundation Model via Variance-Invariance-Covariance Regularization
Hyebin Ahn
Kangwook Jang
Hoirin Kim
101
1
0
17 Aug 2025
Class Unbiasing for Generalization in Medical Diagnosis
Class Unbiasing for Generalization in Medical Diagnosis
Lishi Zuo
Man-Wai Mak
Lu Yi
Youzhi Tu
187
0
0
09 Aug 2025
Parallel GPT: Harmonizing the Independence and Interdependence of Acoustic and Semantic Information for Zero-Shot Text-to-Speech
Parallel GPT: Harmonizing the Independence and Interdependence of Acoustic and Semantic Information for Zero-Shot Text-to-Speech
Jingyuan Xing
Zhipeng Li
Jialong Mai
Xiaofen Xing
Xiangmin Xu
215
0
0
06 Aug 2025
Multimodal Referring Segmentation: A Survey
Multimodal Referring Segmentation: A Survey
Henghui Ding
Song Tang
Shuting He
Chang-rui Liu
Zuxuan Wu
Yu-Gang Jiang
384
11
0
01 Aug 2025
Speaker Disentanglement of Speech Pre-trained Model Based on Interpretability
Speaker Disentanglement of Speech Pre-trained Model Based on Interpretability
Xiaoxu Zhu
Junhua Li
Aaron J. Li
Yiming Ren
Baoxiang Li
189
0
0
19 Jul 2025
MoDA: Multi-modal Diffusion Architecture for Talking Head Generation
MoDA: Multi-modal Diffusion Architecture for Talking Head Generation
Xinyang Li
Gen Li
Zhihui Lin
Yichen Qian
Gongxin Yao
Weinan Jia
Aowen Wang
Weihua Chen
Fan Wang
DiffMVGen
282
0
0
04 Jul 2025
Audio-3DVG: Unified Audio -- Point Cloud Fusion for 3D Visual Grounding
Audio-3DVG: Unified Audio -- Point Cloud Fusion for 3D Visual Grounding
Duc Cao-Dinh
Khai Le-Duc
Anh Dao
Bach Phan Tat
Chris Ngo
Duy M. H. Nguyen
Nguyen X. Khanh
Thanh Nguyen-Tang
226
0
0
01 Jul 2025
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
Gaojie Lin
Jianwen Jiang
Jiaqi Yang
Zerong Zheng
Chao Liang
DiffMVGen
1.3K
85
0
01 Jul 2025
Manipulated Regions Localization For Partially Deepfake Audio: A Survey
Manipulated Regions Localization For Partially Deepfake Audio: A Survey
Jiayi He
Jiangyan Yi
Jianhua Tao
Siding Zeng
Hao Gu
193
2
0
17 Jun 2025
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models
Chih-Kai Yang
Neo Ho
Yi-Jyun Lee
Hung-yi Lee
AuLLM
373
4
0
05 Jun 2025
SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction
SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS PredictionInternational Conference on Signal Processing and Communications (ICSPC), 2024
Saurabh Agrawal
Raj Gohil
Gopal Kumar Agrawal
Vikram C M
Kushal Verma
150
1
0
02 Jun 2025
Revisiting SSL for sound event detection: complementary fusion and adaptive post-processing
Revisiting SSL for sound event detection: complementary fusion and adaptive post-processingJournal of King Saud University: Computer and Information Sciences (J. King Saud Univ. Comput. Inf. Sci.), 2025
Hanfang Cui
Longfei Song
Li Li
Dongxing Xu
Yanhua Long
346
0
0
17 May 2025
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
J. Choi
Ji-Hoon Kim
Kim Sung-Bin
Tae-Hyun Oh
Joon Son Chung
DiffM
457
1
0
29 Apr 2025
StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models
StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Yeona Hong
Hyewon Han
Woo-Jin Chung
Hong-Goo Kang
MQ
342
0
0
21 Apr 2025
DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning
DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning
Chengxuan Qian
Shuo Xing
Shawn Li
Yue Zhao
Zhengzhong Tu
328
11
0
14 Mar 2025
Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation
Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation
Baptiste Chopin
Tashvik Dhamija
P. Balaji
Yaohui Wang
A. Dantcheva
DiffMVGen
285
3
0
24 Feb 2025
Provable Benefits of Unsupervised Pre-training and Transfer Learning via Single-Index Models
Taj Jones-McCormick
Aukosh Jagannath
S. Sen
405
2
0
24 Feb 2025
On the Robust Approximation of ASR MetricsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Abdul Waheed
Hanin Atwany
Rita Singh
Bhiksha Raj
315
2
0
18 Feb 2025
Evaluation of Deep Audio Representations for Hearables
Evaluation of Deep Audio Representations for HearablesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Fabian Gröger
Pascal Baumann
Ludovic Amruthalingam
Laurent Simon
Ruksana Giurda
Simone Lionetti
364
1
0
10 Feb 2025
WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning
WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Rajath Rao
Adithya Ganesan
Oscar Kjell
Jonah Luby
Akshay Raghavan
...
B. Luft
Camilo Ruggero
Neville Ryant
R. Kotov
H. Andrew Schwartz
460
2
0
15 Jan 2025
FAST: Fast Audio Spectrogram TransformerIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Anugunj Naman
Gaibo Zhang
144
2
0
03 Jan 2025
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
O. Mutlu
Ataberk Olgun
Geraldo F. Oliveira
Ismail Emir Yüksel
321
11
0
26 Dec 2024
Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion TransformerComputer Vision and Pattern Recognition (CVPR), 2024
Jiahao Cui
Hui Li
Yun Zhan
Hanlin Shang
K. Cheng
Yuqi Ma
Shan Mu
Hang Zhou
Jingdong Wang
Siyu Zhu
ViTVGen
545
78
0
01 Dec 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning TechniquesApplied Soft Computing (Appl. Soft Comput.), 2024
David Ortiz-Perez
Manuel Benavent-Lledo
José García Rodríguez
David Tomás
M. Flores Vizcaya-Moreno
231
3
0
24 Oct 2024
Detecting Adversarial Examples
Detecting Adversarial Examples
Furkan Mumcu
Yasin Yilmaz
AAML
260
4
0
22 Oct 2024
Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads
Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads
Federico Nocentini
T. Besnier
Claudio Ferrari
Sylvain Arguillere
Stefano Berretti
Mohamed Daoudi
365
2
0
14 Oct 2024
1234
Next