Title
Temporal-Frequency State Space Duality: An Efficient Paradigm for Speech Emotion Recognition Jiaqi Zhao Fei Wang Kun Li Yanyan Wei Shengeng Tang Shu Zhao Xiao Sun Mamba 91 2 0 22 Dec 2024
Autoregressive Speech Synthesis with Next-Distribution Prediction Xinfa Zhu WenJie Tian Lei Xie VLM 165 4 0 22 Dec 2024
LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration Sangmin Lee Woo-Jin Chung Hong-Goo Kang Hong-Goo Kang 65 0 0 19 Dec 2024
Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis Zhoulin Ji Chenhao Lin Hang Wang Chao Shen 94 0 0 12 Dec 2024
Investigating Acoustic-Textual Emotional Inconsistency Information for Automatic Depression Detection Rongfeng Su Changqing Xu Xinyi Wu Feng Xu Xie Chen Lan Wangt Nan Yan 29 0 0 09 Dec 2024
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR Pengcheng Guo Xuankai Chang Hang Lv Shinji Watanabe Lei Xie 54 0 0 07 Dec 2024
CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing Yen-Ju Lu Jing Liu Thomas Thebaud Laureano Moro Velázquez Ariya Rastrow Najim Dehak Jesus Villalba 65 1 0 05 Dec 2024
FreeCodec: A disentangled neural speech codec with fewer tokens Youqiang Zheng Weiping Tu Yueteng Kang Jie Chen Yike Zhang Li Xiao Yuhong Yang Long Ma 62 1 0 02 Dec 2024
Scaling Transformers for Low-Bitrate High-Quality Speech Coding Julian Parker Anton Smirnov Jordi Pons CJ Carr Zack Zukowski Zach Evans Xubo Liu 68 9 0 29 Nov 2024
How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario Shih-Heng Wang Zih-Ching Chen Jiatong Shi Ming To Chuang Guan-Ting Lin Kuan Po Huang David F. Harwath Shang-Wen Li Hung-yi Lee 70 1 0 27 Nov 2024
Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition Shih-Heng Wang Jiatong Shi Chien-yu Huang Shinji Watanabe Hung-yi Lee 59 0 0 27 Nov 2024
Multi-Resolution Generative Modeling of Human Motion from Limited Data David Eduardo Moreno-Villamarín A. Hilsmann Peter Eisert DiffM 3DH 76 0 0 25 Nov 2024
SKQVC: One-Shot Voice Conversion by K-Means Quantization with Self-Supervised Speech Representations Youngjun Sim Jinsung Yoon Young-Joo Suh 64 0 0 25 Nov 2024
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM Jiawei Yu Y. Li Xiaosong Qiao Huan Zhao Xiaofeng Zhao Wei Tang M. Zhang Hao Yang Jinsong Su 63 0 0 20 Nov 2024
An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems Jingyu Li Aemon Yat Fei Chiu Tan Lee 54 0 0 18 Nov 2024
Robust AI-Synthesized Speech Detection Using Feature Decomposition Learning and Synthesizer Feature Augmentation Kuiyuan Zhang Zhongyun Hua Yushu Zhang Yifang Guo Tao Xiang 21 0 0 14 Nov 2024
Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech Eleonora Mancini Francesco Paissan Paolo Torroni Mirco Ravanelli Cem Subakan 42 0 0 12 Nov 2024
Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition Yoshiki Masuyama Koichi Miyazaki Masato Murata Mamba 28 0 0 11 Nov 2024
CTC-Assisted LLM-Based Contextual ASR Guanrou Yang Z. Ma Zhifu Gao Shiliang Zhang Xie Chen 21 2 0 10 Nov 2024
Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward Shashi Kumar Iuliia Thorbecke Sergio Burdisso Esaú Villatoro-Tello M. Errecalde Kadri Hacioğlu Pradeep Rangappa P. Motlícek A. Ganapathiraju Andreas Stolcke 41 1 0 06 Nov 2024
MOS-Bench: Benchmarking Generalization Abilities of Subjective Speech Quality Assessment Models Wen-Chin Huang Erica Cooper T. Toda 21 4 0 06 Nov 2024
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch Wupeng Wang Zexu Pan X. Li Shuai Wang H. Li 24 3 0 05 Nov 2024
Real-Time Scream Detection and Position Estimation for Worker Safety in Construction Sites Bikalpa Gautam Anmol Guragain Sarthak Giri 24 0 0 05 Nov 2024
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector Deok-Hyeon Cho Hyung-Seok Oh Seung-Bin Kim Seong-Whan Lee 37 3 0 04 Nov 2024
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models Heng-Jui Chang Hongyu Gong Changhan Wang James R. Glass Yu-An Chung 26 0 0 31 Oct 2024
An Empirical Analysis of Speech Self-Supervised Learning at Multiple Resolutions Theo Clark Benedetta Cevoli Eloy de Jong Timofey Abramski Jamie Dougherty SSL 31 0 0 31 Oct 2024
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis Théodor Lemerle Harrison Vanderbyl Vaibhav Srivastav Nicolas Obin Axel Roebel 31 1 0 30 Oct 2024
Enhancing Lie Detection Accuracy: A Comparative Study of Classic ML, CNN, and GCN Models using Audio-Visual Features Abdelrahman Abdelwahab Abdelrahman Abdelwahab Ayaan Vaswani Advait Bharathulwar Arnav Kommaraju 16 1 0 26 Oct 2024
Personality Analysis from Online Short Video Platforms with Multi-domain Adaptation Sixu An X. Sun Yicong Li Yu Yang Guandong Xu 26 0 0 26 Oct 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques David Ortiz-Perez Manuel Benavent-Lledo José García Rodríguez David Tomás M. Flores Vizcaya-Moreno 16 0 0 24 Oct 2024
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning Yifan Peng Krishna C. Puvvada Zhehuai Chen Piotr .Zelasko He Huang Kunal Dhawan Ke Hu Shinji Watanabe Jagadeesh Balam Boris Ginsburg 41 2 0 23 Oct 2024
Characterizing Robocalls with Multiple Vantage Points Sathvik Prasad Aleksandr Nahapetyan Bradley Reaves 19 0 0 22 Oct 2024
Continuous Speech Tokenizer in Text To Speech Yixing Li Ruobing Xie X. Sun Yu Cheng Zhanhui Kang AuLLM CLL 31 2 0 22 Oct 2024
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec Yiwei Guo Zhihan Li Chenpeng Du Hankun Wang Xie Chen Kai Yu 31 0 0 21 Oct 2024
Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example Suhita Ghosh Melanie Jouaiti Arnab Das Yamini Sinha Tim Polzehl Ingo Siegert Sebastian Stober 18 2 0 20 Oct 2024
Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses Suhita Ghosh Tim Thiele Frederic Lorbeer Frank Dreyer Sebastian Stober 25 0 0 20 Oct 2024
Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS T. Nguyen Seymanur Akti Ngoc-Quan Pham A. Waibel 18 0 0 19 Oct 2024
Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention Yuzhe Weng Haotian Wang Tian Gao Kewei Li Shutong Niu Jun Du 28 0 0 19 Oct 2024
AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech Recognition using Agnostic Contrastive Mixup Carlos Carvalho A. Abad 11 0 0 18 Oct 2024
Optimal Transport Maps are Good Voice Converters Arip Asadulaev Rostislav Korst V. Shutov Alexander Korotin Yaroslav Grebnyak Vahe Egiazarian E. Burnaev OT 17 1 0 17 Oct 2024
STCON System for the CHiME-8 Challenge Anton Mitrofanov Tatiana Prisyach Tatiana Timofeeva Sergei Novoselov M. Korenevsky ... Dmitriy Miroshnichenko Nikita Mamaev Ilya Odegov Olga Rudnitskaya A. Romanenko 16 1 0 17 Oct 2024
On the Use of Audio to Improve Dialogue Policies Daniel Roncel Federico Costa Javier Hernando 19 0 0 17 Oct 2024
End-to-End Integration of Speech Emotion Recognition with Voice Activity Detection using Self-Supervised Learning Features Natsuo Yamashita Masaaki Yamamoto Y. Kawaguchi 16 0 0 17 Oct 2024
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning Ashish Seth Ramaneswaran Selvakumar S. Sakshi Sonal Kumar Sreyan Ghosh Dinesh Manocha 19 0 0 17 Oct 2024
Multi-View Multi-Task Modeling with Speech Foundation Models for Speech Forensic Tasks Orchid Chetia Phukan Devyani Koshal Swarup Ranjan Behera Arun Balaji Buduru Rajesh Sharma 16 0 0 16 Oct 2024
SeQuiFi: Mitigating Catastrophic Forgetting in Speech Emotion Recognition with Sequential Class-Finetuning Sarthak Jain Orchid Chetia Phukan Swarup Ranjan Behera Arun Balaji Buduru Rajesh Sharma CLL 19 0 0 16 Oct 2024
Exploiting Longitudinal Speech Sessions via Voice Assistant Systems for Early Detection of Cognitive Decline Kristin Qi Jiatong Shi Caroline Summerour J. Batsis Xiaohui Liang 26 0 0 16 Oct 2024
Investigation of Speaker Representation for Target-Speaker Speech Processing Takanori Ashihara Takafumi Moriya Shota Horiguchi Junyi Peng Tsubasa Ochiai Marc Delcroix Kohei Matsuura Hiroshi Sato 16 1 0 15 Oct 2024
JOOCI: a Framework for Learning Comprehensive Speech Representations Hemant Yadav R. Shah Sunayana Sitaram 11 0 0 14 Oct 2024
Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads Federico Nocentini T. Besnier Claudio Ferrari Sylvain Arguillere Stefano Berretti Mohamed Daoudi 53 1 0 14 Oct 2024