Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1904.05862
Cited By
v1
v2
v3
v4 (latest)
wav2vec: Unsupervised Pre-training for Speech Recognition
11 April 2019
Steffen Schneider
Alexei Baevski
R. Collobert
Michael Auli
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"wav2vec: Unsupervised Pre-training for Speech Recognition"
50 / 190 papers shown
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
International Conference on Learning Representations (ICLR), 2024
Jiahao Cui
Hui Li
Yao Yao
Hao Zhu
Hanlin Shang
Kaihui Cheng
Hang Zhou
Siyu Zhu
Jingdong Wang
DiffM
VGen
334
76
0
10 Oct 2024
Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap
Georgia Channing
Juil Sock
Ronald Clark
Juil Sock
Christian Schroeder de Witt
194
5
0
09 Oct 2024
InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries
Asian Conference on Machine Learning (ACML), 2024
Mengze Hong
Chen Jason Zhang
Lingxiao Yang
Wailing Ng
Chen Zhang
219
3
0
29 Sep 2024
DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis
Fa-Ting Hong
Yunfei Liu
Yu Li
Changyin Zhou
Fei Yu
D. Xu
DiffM
239
3
0
16 Sep 2024
Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System Performance
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Huang-Cheng Chou
Haibin Wu
Hung-yi Lee
Chi-Chun Lee
410
3
0
16 Sep 2024
Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages
Yao-Fei Cheng
Li-Wei Chen
Hung-Shin Lee
Hsin-Min Wang
298
1
0
13 Sep 2024
Layer-aware TDNN: Speaker Recognition Using Multi-Layer Features from Pre-Trained Models
Jin Sob Kim
Hyun Joon Park
Wooseok Shin
Juan Yun
Sung Won Han
SLR
454
2
0
12 Sep 2024
What is lost in Normalization? Exploring Pitfalls in Multilingual ASR Model Evaluations
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Kavya Manohar
Leena G Pillai
236
4
0
04 Sep 2024
CyberHost: Taming Audio-driven Avatar Diffusion Model with Region Codebook Attention
Gaojie Lin
Jianwen Jiang
Chao Liang
Tianyun Zhong
Jiaqi Yang
Yanbo Zheng
VGen
DiffM
553
33
0
03 Sep 2024
GSIFN: A Graph-Structured and Interlaced-Masked Multimodal Transformer-based Fusion Network for Multimodal Sentiment Analysis
Yijie Jin
209
3
0
27 Aug 2024
Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation
Hemant Yadav
Sunayana Sitaram
R. Shah
SSL
305
0
0
20 Aug 2024
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2024
Nakamasa Inoue
Shinta Otake
Takumi Hirose
Masanari Ohi
Rei Kawakami
247
6
0
28 Jul 2024
Sentiment Reasoning for Healthcare
Khai-Nguyen Nguyen
Khai Le-Duc
Bach Phan Tat
Duy Le
Long Vo-Dang
Long Vo-Dang
LRM
385
3
0
24 Jul 2024
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions
Zhiyuan Chen
Jiajiong Cao
Zhiquan Chen
Yuming Li
Chenguang Ma
VGen
274
159
0
11 Jul 2024
STONE: Self-supervised Tonality Estimator
Yuexuan Kong
Vincent Lostanlen
Gabriel Meseguer-Brocal
Stella Wong
Mathieu Lagrange
Romain Hennequin
329
7
0
10 Jul 2024
Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time
Salvatore Greco
Bartolomeo Vacchetti
D. Apiletti
Tania Cerquitelli
261
13
0
24 Jun 2024
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
Mingwang Xu
Hui Li
Qingkun Su
Hanlin Shang
Liwei Zhang
Ce Liu
Jingdong Wang
Yao Yao
Siyu Zhu
VGen
255
166
0
13 Jun 2024
Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
Eungbeom Kim
Hantae Kim
Kyogu Lee
185
2
0
12 Jun 2024
MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations
Interspeech (Interspeech), 2024
Hemant Yadav
Sunayana Sitaram
R. Shah
SSL
305
3
0
09 Jun 2024
SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model
Siavash Shams
Sukru Samet Dindar
Xilin Jiang
N. Mesgarani
Mamba
322
39
0
20 May 2024
Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition
Dongyuan Li
Ying Zhang
Yusong Wang
Funakoshi Kataro
Manabu Okumura
286
3
0
01 May 2024
Mai Hoómāuna i ka Ái: Language Models Improve Automatic Speech Recognition in Hawaiian
Kaavya Chaparala
Guido Zarrella
Bruce Torres Fischer
Larry Kimura
Oiwi Parker Jones
AuLLM
157
0
0
03 Apr 2024
FeatUp: A Model-Agnostic Framework for Features at Any Resolution
Stephanie Fu
Mark Hamilton
Laura E. Brandt
Axel Feldmann
Zhoutong Zhang
William T. Freeman
MDE
317
83
0
15 Mar 2024
VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis
Wei-wei Lin
Chenhang He
Man-Wai Mak
Jiachen Lian
Kong Aik Lee
DiffM
241
2
0
01 Mar 2024
Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART
Aniket Tathe
Anand Kamble
Suyash Kumbharkar
Atharva Bhandare
Anirban C. Mitra
140
3
0
01 Mar 2024
EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Linrui Tian
Qi Wang
Bang Zhang
Liefeng Bo
DiffM
318
218
0
27 Feb 2024
Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0
Taein Kang
Soyul Han
Sunmook Choi
Jaejin Seo
Sanghyeok Chung
Seungeun Lee
Seungsang Oh
Il-Youp Kwak
247
10
0
27 Feb 2024
Learning to Generate Context-Sensitive Backchannel Smiles for Embodied AI Agents with Applications in Mental Health Dialogues
Maneesh Bilalpur
Mert Inan
Dorsa Zeinali
Jeffrey F. Cohn
Malihe Alikhani
276
1
0
13 Feb 2024
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Chen Chen
Ruizhe Li
Yuchen Hu
Sabato Marco Siniscalchi
Pin-Yu Chen
Ensiong Chng
Chao-Han Huck Yang
227
32
0
08 Feb 2024
Streaming Sequence Transduction through Dynamic Compression
Weiting Tan
Yunmo Chen
Tongfei Chen
Guanghui Qin
Haoran Xu
Heidi C. Zhang
Benjamin Van Durme
Philipp Koehn
500
2
0
02 Feb 2024
MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error Correction
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Jiajun He
Xiaohan Shi
Xingfeng Li
Tomoki Toda
215
31
0
24 Jan 2024
Towards Weakly Supervised Text-to-Audio Grounding
Xuenan Xu
Ziyang Ma
Mengyue Wu
Kai Yu
AI4TS
353
17
0
05 Jan 2024
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shaojin Ding
David Qiu
David Rim
Yanzhang He
Oleg Rybakov
...
Tara N. Sainath
Zhonglin Han
Jian Li
Amir Yazdanbakhsh
Shivani Agrawal
MQ
472
13
0
13 Dec 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
377
33
0
27 Nov 2023
Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables
European Signal Processing Conference (EUSIPCO), 2023
Ahmed Adel Attia
Yashish M. Siriwardena
Carol Espy-Wilson
SSL
221
13
0
17 Sep 2023
Indonesian Automatic Speech Recognition with XLSR-53
Social Science Research Network (SSRN), 2022
Panji Arisaputra
Amalia Zahra
116
10
0
20 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
326
378
0
10 Aug 2023
Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
IEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2023
Weidong Chen
Xiaofen Xing
Peihao Chen
Xiangmin Xu
VLM
298
65
0
20 Jul 2023
Multimodal Audio-textual Architecture for Robust Spoken Language Understanding
Anderson R. Avila
Mehdi Rezagholizadeh
Chao Xing
162
1
0
12 Jun 2023
PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models
Affective Computing and Intelligent Interaction (ACII), 2023
Tiantian Feng
Shrikanth Narayanan
257
41
0
08 Jun 2023
HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders
Interspeech (Interspeech), 2023
Doyeon Kim
Soo-Whan Chung
Hyewon Han
Youna Ji
Hong-Goo Kang
178
12
0
02 Jun 2023
Scaling Speech Technology to 1,000+ Languages
Journal of machine learning research (JMLR), 2023
Vineel Pratap
Andros Tjandra
Bowen Shi
Paden Tomasello
Arun Babu
...
Yossi Adi
Xiaohui Zhang
Wei-Ning Hsu
Alexis Conneau
Michael Auli
VLM
389
522
0
22 May 2023
Duplex Diffusion Models Improve Speech-to-Speech Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Xianchao Wu
DiffM
219
5
0
22 May 2023
TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Tiantian Feng
Rajat Hebbar
Shrikanth Narayanan
167
11
0
18 May 2023
A Survey on Time-Series Pre-Trained Models
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2023
Qianli Ma
Ziqiang Liu
Zhenjing Zheng
Ziyang Huang
Siying Zhu
Zhongzhong Yu
James T. Kwok
AI4TS
282
88
0
18 May 2023
A multimodal dynamical variational autoencoder for audiovisual speech representation learning
Neural Networks (NN), 2022
Samir Sadok
Simon Leglaive
Laurent Girin
Xavier Alameda-Pineda
Renaud Séguier
356
21
0
05 May 2023
MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning
ACM Multimedia (ACM MM), 2023
Zheng Lian
Haiyang Sun
Guoying Zhao
Kang Chen
Mingyu Xu
...
Meng Wang
Xiaoshi Zhong
Guoying Zhao
Björn W. Schuller
Jianhua Tao
271
81
0
18 Apr 2023
HCAM -- Hierarchical Cross Attention Model for Multi-modal Emotion Recognition
Soumya Dutta
Sriram Ganapathy
354
23
0
14 Apr 2023
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Yuchen Hu
Cheng Chen
Qiu-shi Zhu
Eng Siong Chng
298
18
0
11 Apr 2023
Transformer-based Self-supervised Multimodal Representation Learning for Wearable Emotion Recognition
IEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2023
Yujin Wu
Mohamed Daoudi
A. Amad
196
77
0
29 Mar 2023
Previous
1
2
3
4
Next