Title
DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes Xilin Jiang Yinghao Aaron Li N. Mesgarani CLL 11 1 0 29 May 2023
Investigating Pre-trained Audio Encoders in the Low-Resource Condition Haomiao Yang Jinming Zhao Gholamreza Haffari Ehsan Shareghi 14 6 0 28 May 2023
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models Yifan Peng Yui Sudo Muhammad Shakeel Shinji Watanabe 11 37 0 28 May 2023
One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification Ju-Sung Heo Chan-yeong Lim Ju-ho Kim Hyun-Seo Shin Ha-Jin Yu 6 2 0 27 May 2023
Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation Yuta Nishikawa Satoshi Nakamura 22 4 0 26 May 2023
Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis Seong-Hyun Park Bohyung Kim Tae-Hyun Oh 19 1 0 26 May 2023
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition Wangyou Zhang Y. Qian 33 10 0 25 May 2023
VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation Tianrui Wang Long Zhou Zi-Hua Zhang Yu-Huan Wu Shujie Liu Yashesh Gaur Zhuo Chen Jinyu Li Furu Wei 32 100 0 25 May 2023
Spoofing Attacker Also Benefits from Self-Supervised Pretrained Model Aoi Ito Shota Horiguchi SSL 6 2 0 24 May 2023
LAraBench: Benchmarking Arabic AI with Large Language Models Ahmed Abdelali Hamdy Mubarak Shammur A. Chowdhury Maram Hasanain Basel Mousi ... Yousseif Elshahawy Ahmed M. Ali Nadir Durrani Natasa Milic-Frayling Firoj Alam ELM LM&MA 6 18 0 24 May 2023
Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss Hiroshi Sato Ryo Masumura Tsubasa Ochiai Marc Delcroix Takafumi Moriya ... Kentaro Shinayama Saki Mizuno Mana Ihori Tomohiro Tanaka Nobukatsu Hojo 18 5 0 24 May 2023
On the Transferability of Whisper-based Representations for "In-the-Wild" Cross-Task Downstream Speech Applications Vamsikrishna Chemudupati Marzieh S. Tahaei Heitor R. Guimarães Arthur Pimentel Anderson R. Avila Mehdi Rezagholizadeh Boxing Chen Tiago H. Falk SSL 29 6 0 23 May 2023
Understanding Spoken Language Development of Children with ASD Using Pre-trained Speech Embeddings Anfeng Xu Rajat Hebbar Rimita Lahiri Tiantian Feng Lindsay K. Butler Lue Shen Helen Tager-Flusberg Shrikanth Narayanan 11 10 0 23 May 2023
Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation Daisuke Niizumi Daiki Takeuchi Yasunori Ohishi Noboru Harada K. Kashino 37 3 0 23 May 2023
Improving speech translation by fusing speech and text Wenbiao Yin Zhicheng Liu Chengqi Zhao Tao Wang Jian-Fei Tong Rong Ye 13 4 0 23 May 2023
Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers? Eklavya Sarkar Mathew Magimai.-Doss 8 11 0 23 May 2023
i-Code Studio: A Configurable and Composable Framework for Integrative AI Yuwei Fang Mahmoud Khademi Chenguang Zhu Ziyi Yang Reid Pryzant ... Yao Qian Takuya Yoshioka Lu Yuan Michael Zeng Xuedong Huang 25 2 0 23 May 2023
Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical Fusion for Multimodal Affect Recognition Yaoting Wang Yuanchao Li Paul Pu Liang Louis-Philippe Morency P. Bell Catherine Lai CVBM 21 8 0 23 May 2023
Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization Marc Delcroix Naohiro Tawara Mireia Díez Federico Landini Anna Silnova A. Ogawa Tomohiro Nakatani L. Burget S. Araki 24 5 0 23 May 2023
SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents Shuzheng Si Wen-Cheng Ma Haoyu Gao Yuchuan Wu Ting-En Lin Yinpei Dai Hangyu Li Rui Yan Fei Huang Yongbin Li AuLLM 24 27 0 22 May 2023
The defender's perspective on automatic speaker verification: An overview Haibin Wu Jiawen Kang Lingwei Meng H. Meng Hung-yi Lee AAML 17 14 0 22 May 2023
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data Ziyi Yang Mahmoud Khademi Yichong Xu Reid Pryzant Yuwei Fang ... Yu Shi Lu Yuan Takuya Yoshioka Michael Zeng Xuedong Huang 17 2 0 21 May 2023
Self-supervised representations in speech-based depression detection Wen Wu C. Zhang P. Woodland 11 23 0 20 May 2023
North Sámi Dialect Identification with Self-supervised Speech Models Sofoklis Kakouros Katri Hiovain-Asikainen 17 4 0 19 May 2023
Scaling laws for language encoding models in fMRI Richard Antonello Aditya R. Vaidya Alexander G. Huth MedIm 14 55 0 19 May 2023
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning Jiyang Tang William Chen Xuankai Chang Shinji Watanabe B. MacWhinney 11 10 0 19 May 2023
Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation Kangwook Jang Sungnyun Kim Se-Young Yun Hoi-Rim Kim 16 5 0 19 May 2023
Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment Tianshu Yu Haoyu Gao Ting-En Lin Min Yang Yuchuan Wu Wen-Cheng Ma Chao Wang Fei Huang Yongbin Li 19 18 0 19 May 2023
Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring Kaiqi Fu Shaojun Gao Shuju Shi Xiaohai Tian Wei Li Zejun Ma 18 2 0 19 May 2023
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model Puyuan Peng Shang-Wen Li Okko Rasanen Abdel-rahman Mohamed David F. Harwath SSL VLM 18 7 0 19 May 2023
DUB: Discrete Unit Back-translation for Speech Translation Dong Zhang Rong Ye Tom Ko Mingxuan Wang Yaqian Zhou 11 22 0 19 May 2023
TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition Tiantian Feng Rajat Hebbar Shrikanth Narayanan 25 7 0 18 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities Peng Wang Shijie Wang Junyang Lin Shuai Bai Xiaohuan Zhou Jingren Zhou Xinggang Wang Chang Zhou VLM MLLM ObjD 16 113 0 18 May 2023
QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation Sicheng Yang Zhiyong Wu Minglei Li Zhensong Zhang Lei Hao Weihong Bao Hao-Wen Zhuang SLR 16 40 0 18 May 2023
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks Yifan Peng Kwangyoun Kim Felix Wu Brian Yan Siddhant Arora William Chen Jiyang Tang Suwon Shon Prashant Sridhar Shinji Watanabe 19 17 0 18 May 2023
Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering Heng-Jui Chang Alexander H. Liu James R. Glass SSL 17 20 0 18 May 2023
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark Jiatong Shi Dan Berrebbi William Chen Ho-Lam Chung En-Pei Hu ... Xuankai Chang Shang-Wen Li Abdel-rahman Mohamed Hung-yi Lee Shinji Watanabe ELM 47 58 0 18 May 2023
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning Alexander H. Liu Heng-Jui Chang Michael Auli Wei-Ning Hsu James R. Glass 16 24 0 17 May 2023
SoundStorm: Efficient Parallel Audio Generation Zalan Borsos Matthew Sharifi Damien Vincent Eugene Kharitonov Neil Zeghidour Marco Tagliasacchi 15 97 0 16 May 2023
Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models Antoni Bigata Casademunt Rodrigo Mira Nikita Drobyshev Konstantinos Vougioukas Stavros Petridis M. Pantic DiffM 56 1 0 15 May 2023
Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations Wei-wei Lin Chenhang He Man-Wai Mak Youzhi Tu 19 5 0 14 May 2023
Masked Audio Text Encoders are Effective Multi-Modal Rescorers Jason (Jinglun) Cai Monica Sunkara Xilai Li Anshu Bhatia Xiao Pan S. Bodapati 15 3 0 11 May 2023
Extending Audio Masked Autoencoders Toward Audio Restoration Zhi-Wei Zhong Hao Shi M. Hirano Kazuki Shimada Kazuya Tateishi Takashi Shibuya Shusuke Takahashi Yuki Mitsufuji 13 4 0 11 May 2023
An Exploration into the Performance of Unsupervised Cross-Task Speech Representations for "In the Wild'' Edge Applications Heitor R. Guimarães Arthur Pimentel Anderson R. Avila Mehdi Rezagholizadeh Tiago H. Falk SSL 6 2 0 09 May 2023
Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models Takanori Ashihara Takafumi Moriya Kohei Matsuura Tomohiro Tanaka 14 3 0 09 May 2023
LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization Peng Lu Ahmad Rashid I. Kobyzev Mehdi Rezagholizadeh Philippe Langlais 6 0 0 08 May 2023
DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models Sicheng Yang Zhiyong Wu Minglei Li Zhensong Zhang Lei Hao Weihong Bao Ming Cheng Long Xiao 22 64 0 08 May 2023
Improved Vocal Effort Transfer Vector Estimation for Vocal Effort-Robust Speaker Verification Iván López-Espejo Santi Prieto Alfonso Ortega EDUARDO LLEIDA SOLANO 9 0 0 03 May 2023
Self-supervised learning for infant cry analysis Arsenii Gorin Cem Subakan Sajjad Abdoli Junhao Wang Samantha Latremouille Charles C. Onu 25 9 0 02 May 2023
Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding Juan Pablo Zuluaga Iuliia Nigmatulina Amrutha Prasad P. Motlícek Driss Khalil ... Allan Tart Igor Szöke Vincent Lenders M. Rigault K. Choukri 11 2 0 02 May 2023