v1v2v3v4 (latest)

wav2vec: Unsupervised Pre-training for Speech Recognition

11 April 2019

Papers citing "wav2vec: Unsupervised Pre-training for Speech Recognition"

50 / 190 papers shown

EmoCAST: Emotional Talking Portrait via Emotive Text Description

132

24 Dec 2025

Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge

221

23 Oct 2025

Proprioceptive Image: An Image Representation of Proprioceptive Data from Quadruped Robots for Contact Estimation Learning

121

16 Oct 2025

On the Alignment Between Supervised and Self-Supervised Contrastive Learning

171

09 Oct 2025

SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models

184

08 Oct 2025

AgentDR Dynamic Recommendation with Implicit Item-Item Relations via LLM-based Agents

148

07 Oct 2025

Audio Driven Real-Time Facial Animation for Social Telepresence

152

01 Oct 2025

Reference-free automatic speech severity evaluation using acoustic unit language modelling

B. Halpern

Tomoki Toda

115

01 Oct 2025

StableDub: Taming Diffusion Prior for Generalized and Efficient Visual Dubbing

202

26 Sep 2025

KSDiff: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation

122

24 Sep 2025

Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition

127

23 Sep 2025

SONAR: Self-Distilled Continual Pre-training for Domain Adaptive Audio Representation

113

19 Sep 2025

Multimodal Learning for Fake News Detection in Short Videos Using Linguistically Verified Data and Heterogeneous Modality Fusion

111

19 Sep 2025

Speech Language Models for Under-Represented Languages: Insights from Wolof

145

18 Sep 2025

Unified Learnable 2D Convolutional Feature Extraction for ASR

158

12 Sep 2025

Contextualized Token Discrimination for Speech Search Query Correction

113

04 Sep 2025

Automatic Pronunciation Error Detection and Correction of the Holy Quran's Learners Using Deep Learning

Abdullah Abdelfattah

M. Khalil

Hazem M. Abbas

120

27 Aug 2025

Wan-S2V: Audio-Driven Cinematic Video Generation

...

142

26 Aug 2025

Amplifying Emotional Signals: Data-Efficient Deep Learning for Robust Speech Emotion Recognition

Tai Vu

176

26 Aug 2025

Whisper based Cross-Lingual Phoneme Recognition between Vietnamese and English

22 Aug 2025

Foundation Models for Cross-Domain EEG Analysis Application: A Survey

196

21 Aug 2025

CUPE: Contextless Universal Phoneme Encoder for Language-Agnostic Speech Processing

Abdul Rehman

Jian-Jun Zhang

Xiaosong Yang

130

21 Aug 2025

EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition

Hugo Thimonier

Antony Perzo

Renaud Seguier

145

19 Aug 2025

InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing

...

127

19 Aug 2025

HuBERT-VIC: Improving Noise-Robust Automatic Speech Recognition of Speech Foundation Model via Variance-Invariance-Covariance Regularization

Hyebin Ahn

Kangwook Jang

Hoirin Kim

101

17 Aug 2025

Class Unbiasing for Generalization in Medical Diagnosis

187

09 Aug 2025

Parallel GPT: Harmonizing the Independence and Interdependence of Acoustic and Semantic Information for Zero-Shot Text-to-Speech

215

06 Aug 2025

Multimodal Referring Segmentation: A Survey

384

01 Aug 2025

Speaker Disentanglement of Speech Pre-trained Model Based on Interpretability

189

19 Jul 2025

MoDA: Multi-modal Diffusion Architecture for Talking Head Generation

282

04 Jul 2025

Audio-3DVG: Unified Audio -- Point Cloud Fusion for 3D Visual Grounding

226

01 Jul 2025

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

1.3K

01 Jul 2025

Manipulated Regions Localization For Partially Deepfake Audio: A Survey

193

17 Jun 2025

AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models

373

05 Jun 2025

SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS PredictionInternational Conference on Signal Processing and Communications (ICSPC), 2024

150

02 Jun 2025

Revisiting SSL for sound event detection: complementary fusion and adaptive post-processingJournal of King Saud University: Computer and Information Sciences (J. King Saud Univ. Comput. Inf. Sci.), 2025

346

17 May 2025

AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation

457

29 Apr 2025

StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

342

21 Apr 2025

DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning

328

14 Mar 2025

Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation

285

24 Feb 2025

Provable Benefits of Unsupervised Pre-training and Transfer Learning via Single-Index Models

Taj Jones-McCormick

Aukosh Jagannath

S. Sen

405

24 Feb 2025

On the Robust Approximation of ASR MetricsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

315

18 Feb 2025

Evaluation of Deep Audio Representations for HearablesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

Fabian Gröger

Pascal Baumann

Ludovic Amruthalingam

Laurent Simon

Ruksana Giurda

Simone Lionetti

364

10 Feb 2025

WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

...

460

15 Jan 2025

FAST: Fast Audio Spectrogram TransformerIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

Anugunj Naman

Gaibo Zhang

144

03 Jan 2025

Memory-Centric Computing: Recent Advances in Processing-in-DRAM

321

26 Dec 2024

Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion TransformerComputer Vision and Pattern Recognition (CVPR), 2024

545

01 Dec 2024

Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning TechniquesApplied Soft Computing (Appl. Soft Comput.), 2024

David Ortiz-Perez

Manuel Benavent-Lledo

José García Rodríguez

David Tomás

M. Flores Vizcaya-Moreno

231

24 Oct 2024

Detecting Adversarial Examples

Furkan Mumcu

Yasin Yilmaz

AAML

260

22 Oct 2024

Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads

365

14 Oct 2024