Title
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering Ya-Zhen Song Zhuo Chen Xiaofei Wang Ziyang Ma Xie Chen AuLLM 111 42 0 14 Jan 2024
Useful Blunders: Can Automated Speech Recognition Errors Improve Downstream Dementia Classification? Changye Li Weizhe Xu Trevor Cohen Serguei V. S. Pakhomov 94 8 0 10 Jan 2024
Hyperbolic Distance-Based Speech Separation Darius Petermann Minje Kim 71 4 0 07 Jan 2024
Pheme: Efficient and Conversational Speech Generation Paweł Budzianowski Taras Sereda Tomasz Cichy Ivan Vulić 78 7 0 05 Jan 2024
Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions H. S. Bovbjerg Jesper Jensen Jan Østergaard Zheng-Hua Tan VLM 59 3 0 27 Dec 2023
Audiobox: Unified Audio Generation with Natural Language Prompts Apoorv Vyas Bowen Shi Matt Le Andros Tjandra Yi-Chiao Wu ... Chris Summers Carleigh Wood Joshua Lane Mary Williamson Wei-Ning Hsu 124 94 0 25 Dec 2023
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification Anirudh S. Sundar Chao-Han Huck Yang David M. Chan Shalini Ghosh Venkatesh Ravichandran P. S. Nidadavolu MoMe 94 9 0 22 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation learning Luis Lugo Valentin Vielzeuf SSL 57 1 0 18 Dec 2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit Xueyao Zhang Liumeng Xue Yicheng Gu Yuancheng Wang Haorui He ... Mingxuan Wang Jun Han Kai Chen Haizhou Li Zhizheng Wu 91 35 0 15 Dec 2023
Generative Context-aware Fine-tuning of Self-supervised Speech Models Suwon Shon Kwangyoun Kim Prashant Sridhar Yi-Te Hsu Shinji Watanabe Karen Livescu 63 2 0 15 Dec 2023
Fine-Tuned Self-Supervised Speech Representations for Language Diarization in Multilingual Code-Switched Speech Geoffrey T. Frost Emily Morris Joshua Jansen van Vüren T. Niesler 60 2 0 15 Dec 2023
Audio-visual fine-tuning of audio-only ASR models Avner May Dmitriy Serdyuk Ankit Parag Shah Otavio Braga Olivier Siohan 70 3 0 14 Dec 2023
PhasePerturbation: Speech Data Augmentation via Phase Perturbation for Automatic Speech Recognition Chengxi Lei Satwinder Singh Feng Hou Xiaoyun Jia Ruili Wang 60 1 0 13 Dec 2023
Extending Whisper with prompt tuning to target-speaker ASR Hao Ma Zhiyuan Peng Mingjie Shao Jing Li Xuelong Li VLM 81 16 0 13 Dec 2023
Revisiting the Entropy Semiring for Neural Speech Recognition Oscar Chang DongSeon Hwang Olivier Siohan 79 3 0 13 Dec 2023
End-to-End Speech-to-Text Translation: A Survey Nivedita Sethiya Chandresh Kumar Maurya 90 8 0 02 Dec 2023
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis Sang-Hoon Lee Haram Choi Seung-Bin Kim Seong-Whan Lee BDL 122 37 0 21 Nov 2023
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis Jungil Kong Junmo Lee Jeongmin Kim Beomjeong Kim Jihoon Park Dohee Kong Changheon Lee Sangjin Kim 84 1 0 20 Nov 2023
DINO-VITS: Data-Efficient Zero-Shot TTS with Self-Supervised Speaker Verification Loss for Noise Robustness Vikentii Pankov Valeria Pronina Alexander Kuzmin Maksim Borisov Nikita Usoltsev Xingshan Zeng Alexander Golubkov Nikolai Ermolenko Aleksandra Shirshova Yulia Matveeva 52 5 0 16 Nov 2023
A comparative analysis between Conformer-Transducer, Whisper, and wav2vec2 for improving the child speech recognition Andrei Barcovschi Rishabh Jain Peter Corcoran 47 3 0 07 Nov 2023
Pre-trained Speech Processing Models Contain Human-Like Biases that Propagate to Speech Emotion Recognition Isaac Slaughter Craig Greenberg Reva Schwartz Aylin Caliskan 93 5 0 29 Oct 2023
Generative Pre-training for Speech with Flow Matching Alexander H. Liu Matt Le Apoorv Vyas Bowen Shi Andros Tjandra Wei-Ning Hsu 102 36 0 25 Oct 2023
Acoustic BPE for Speech Generation with Discrete Tokens Feiyu Shen Yiwei Guo Chenpeng Du Xie Chen Kai Yu 66 13 0 23 Oct 2023
Unintended Memorization in Large ASR Models, and How to Mitigate It Lun Wang Om Thakkar Rajiv Mathews 83 6 0 18 Oct 2023
Spatial HuBERT: Self-supervised Spatial Speech Representation Learning for a Single Talker from Multi-channel Audio Antoni Dimitriadis Siqi Pan V. Sethu Beena Ahmed SSL 53 3 0 17 Oct 2023
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations Paarth Neekhara Shehzeen Samarah Hussain Rafael Valle Boris Ginsburg Rishabh Ranjan Shlomo Dubnov F. Koushanfar Julian McAuley 50 3 0 14 Oct 2023
XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words Robin Algayres Pablo Diego-Simon Benoît Sagot Emmanuel Dupoux 96 1 0 08 Oct 2023
Generative Spoken Language Model based on continuous word-sized audio tokens Robin Algayres Yossi Adi Tu Nguyen Jade Copet Gabriel Synnaeve Benoît Sagot Emmanuel Dupoux AuLLM 116 16 0 08 Oct 2023
Acoustic and linguistic representations for speech continuous emotion recognition in call center conversations Manon Macary Marie Tahon Yannick Esteve Daniel Luzzati 68 3 0 06 Oct 2023
Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model Kai-Wei Chang Ming-Hsin Chen Yun-Ping Lin Jing Neng Hsu Paul Kuo-Ming Huang Chien-yu Huang Shang-Wen Li Hung-yi Lee 100 6 0 04 Oct 2023
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction Jiatong Shi Hirofumi Inaguma Xutai Ma Ilia Kulikov Anna Y. Sun 111 27 0 04 Oct 2023
Unsupervised Speech Recognition with N-Skipgram and Positional Unigram Matching Liming Wang M. Hasegawa-Johnson Chang D. Yoo SSL 86 2 0 03 Oct 2023
UniAudio: An Audio Foundation Model Toward Universal Audio Generation Dongchao Yang Jinchuan Tian Xuejiao Tan Rongjie Huang Songxiang Liu ... Jiang Bian Xixin Wu Zhou Zhao Shinji Watanabe Helen M. Meng CVBM AuLLM 103 128 0 01 Oct 2023
Low-Resource Self-Supervised Learning with SSL-Enhanced TTS Xin Wang Taein Kwon Wei-Ning Hsu Yossi Adi Tu Nguyen D. Bohus Emmanuel Dupoux Neel Joshi Abdelrahman Mohamed 42 4 0 29 Sep 2023
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models Cheng Chen Yuchen Hu Chao-Han Huck Yang Sabato Marco Siniscalchi Pin-Yu Chen Eng Siong Chng 96 48 0 27 Sep 2023
Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting Chao-Han Huck Yang Yile Gu Yi-Chieh Liu Shalini Ghosh I. Bulyko A. Stolcke KELM LRM 106 52 0 27 Sep 2023
Human Transcription Quality Improvement Jian Gao Hanbo Sun Cheng Cao Zheng Du 137 3 0 24 Sep 2023
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context Wei Kang Xiaoyu Yang Zengwei Yao Fangjun Kuang Yifan Yang Liyong Guo Long Lin Daniel Povey 83 56 0 15 Sep 2023
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks Sizhou Chen Songyang Gao Sen Fang 26 0 0 14 Sep 2023
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer Yongqiang Wang Jionghao Bai Rongjie Huang Ruiqi Li Zhiqing Hong Zhou Zhao 49 3 0 14 Sep 2023
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks Soumi Maiti Yifan Peng Shukjae Choi Jee-weon Jung Xuankai Chang Shinji Watanabe VLM AuLLM 123 69 0 14 Sep 2023
EnCodecMAE: Leveraging neural codecs for universal audio representation learning L. Pepino Pablo Riera Luciana Ferrer 76 5 0 14 Sep 2023
Simultaneous Measurement of Multiple Acoustic Attributes Using Structured Periodic Test Signals Including Music and Other Sound Materials Hideki Kawahara Kohei Yatabe Ken-Ichi Sakakibara M. Mizumachi Tatsuya Kitamura 21 1 0 06 Sep 2023
Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion Wen-Chin Huang Tomoki Toda CVBM 89 5 0 05 Sep 2023
Knowledge Distillation from Non-streaming to Streaming ASR Encoder using Auxiliary Non-streaming Layer Kyuhong Shim Jinkyu Lee Simyoung Chang Kyuwoong Hwang 73 3 0 31 Aug 2023
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads Salah Zaiem Youcef Kemiche Titouan Parcollet S. Essid Mirco Ravanelli SSL 43 10 0 28 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook S. Latif Moazzam Shoukat Fahad Shamshad Muhammad Usama Yi Ren ... Wenwu Wang Xulong Zhang Roberto Togneri Min Zhang Björn W. Schuller LM&MA AuLLM 180 39 0 24 Aug 2023
Improving Continuous Sign Language Recognition with Cross-Lingual Signs Fangyun Wei Yutong Chen SLR 69 33 0 21 Aug 2023
TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition Hakan Erdogan Scott Wisdom Xuankai Chang Zalan Borsos Marco Tagliasacchi Neil Zeghidour J. Hershey 69 11 0 21 Aug 2023
Decoding Emotions: A comprehensive Multilingual Study of Speech Models for Speech Emotion Recognition Anant Singh Akshat Gupta 66 5 0 17 Aug 2023