Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2011.11588
Cited By
v1
v2 (latest)
The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling
23 November 2020
Tu Nguyen
Maureen de Seyssel
Patricia Roze
M. Rivière
Evgeny Kharitonov
Alexei Baevski
Ewan Dunbar
Emmanuel Dupoux
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling"
50 / 87 papers shown
Latent Speech-Text Transformer
Yen-Ju Lu
Yashesh Gaur
Wei Zhou
Benjamin Muller
Jesus Villalba
...
Luke Zettlemoyer
Gargi Ghosh
Mike Lewis
Srinivasan Iyer
Duc Le
VLM
182
5
0
07 Oct 2025
LongTail-Swap: benchmarking language models' abilities on rare words
Robin Algayres
Charles-Éric Saint-James
Mahi Luthra
Jiayi Shen
Dongyan Lin
Youssef Benchekroun
Rashel Moritz
Juan Pino
Emmanuel Dupoux
145
1
0
05 Oct 2025
Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models
María Andrea Cruz Blandón
Zakaria Aldeneh
Jie Chi
Maureen de Seyssel
SSL
203
0
0
22 Sep 2025
Llama-Mimi: Exploring the Limits of Flattened Speech Language Modeling
Issa Sugiura
Shuhei Kurita
Yusuke Oda
Ryuichiro Higashinaka
AuLLM
187
2
0
18 Sep 2025
An Empirical Analysis of Discrete Unit Representations in Speech Language Modeling Pre-training
International Conference on Text, Speech and Dialogue (TSD), 2025
Yanis Labrak
Richard Dufour
Mickael Rouvier
126
1
0
03 Sep 2025
Representing Speech Through Autoregressive Prediction of Cochlear Tokens
Greta Tuckute
Klemen Kotar
Evelina Fedorenko
Daniel L. K. Yamins
197
0
0
15 Aug 2025
Flow-SLM: Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling
Ju-Chieh Chou
Jiawei Zhou
Karen Livescu
296
6
0
12 Aug 2025
Pitch Accent Detection improves Pretrained Automatic Speech Recognition
David Sasu
Natalie Schluter
77
0
0
06 Aug 2025
A Variational Framework for Improving Naturalness in Generative Spoken Language Models
Li-Wei Chen
Takuya Higuchi
Zakaria Aldeneh
Ahmed Hussen Abdelaziz
Alexander I. Rudnicky
263
2
0
17 Jun 2025
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models
Chih-Kai Yang
Neo Ho
Yi-Jyun Lee
Hung-yi Lee
AuLLM
436
12
0
05 Jun 2025
fastabx: A library for efficient computation of ABX discriminability
Maxime Poli
Emmanuel Chemla
Emmanuel Dupoux
327
7
0
05 May 2025
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
Liang-Hsuan Tseng
Yi-Chang Chen
Kuan-Yi Lee
Da-shan Shiu
Hung-yi Lee
AuLLM
570
17
0
09 Apr 2025
Late Fusion and Multi-Level Fission Amplify Cross-Modal Transfer in Text-Speech LMs
Santiago Cuervo
Adel Moumen
Yanis Labrak
Sameer Khurana
Antoine Laurent
Mickael Rouvier
Phil Woodland
R. Marxer
400
1
0
08 Mar 2025
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Guan-Ting Lin
Prashanth Gurunath Shivakumar
Aditya Gourav
Yile Gu
Ankur Gandhe
Hung-yi Lee
I. Bulyko
437
31
0
04 Nov 2024
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
Heng-Jui Chang
Hongyu Gong
Changhan Wang
James R. Glass
Yu-An Chung
364
6
0
31 Oct 2024
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
International Conference on Learning Representations (ICLR), 2024
Cheol Jun Cho
Nicholas Lee
Akshat Gupta
Dhruv Agarwal
Ethan Chen
Alan W Black
Gopala K. Anumanchipalli
332
20
0
09 Oct 2024
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
International Conference on Learning Representations (ICLR), 2024
Alan Baade
Puyuan Peng
David Harwath
416
27
0
05 Oct 2024
Recent Advances in Speech Language Models: A Survey
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
747
93
0
01 Oct 2024
SSR: Alignment-Aware Modality Connector for Speech Language Models
International Workshop on Spoken Language Translation (IWSLT), 2024
Weiting Tan
Hirofumi Inaguma
Ning Dong
Paden Tomasello
Xutai Ma
515
14
0
30 Sep 2024
Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Bandhav Veluri
Benjamin Peloquin
Bokai Yu
Hongyu Gong
Shyamnath Gollakota
AuLLM
OffRL
361
53
0
23 Sep 2024
Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Maxime Poli
Emmanuel Chemla
Emmanuel Dupoux
263
6
0
16 Sep 2024
LAST: Language Model Aware Speech Tokenization
A. Turetzky
Yossi Adi
402
9
0
05 Sep 2024
NAST: Noise Aware Speech Tokenization for Speech Language Models
Shoval Messica
Yossi Adi
272
13
0
16 Jun 2024
Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations
Mukhtar Mohamed
Oli Danyi Liu
Hao Tang
Sharon Goldwater
SSL
327
10
0
13 Jun 2024
A predictive learning model can simulate temporal dynamics and context effects found in neural representations of continuous speech
Oli Danyi Liu
Hao Tang
Naomi H Feldman
Sharon Goldwater
308
3
0
13 May 2024
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
342
64
0
15 Apr 2024
Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling
Injune Hwang
Kyogu Lee
214
1
0
01 Apr 2024
Scaling Properties of Speech Language Models
Santiago Cuervo
R. Marxer
328
26
0
31 Mar 2024
Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model
Hung-Chieh Fang
Nai-Xuan Ye
Yi-Jen Shih
Puyuan Peng
Hsuan-Fu Wang
Layne Berry
Hung-yi Lee
David Harwath
VLM
267
1
0
08 Feb 2024
SpiRit-LM: Interleaved Spoken and Written Language Model
Tu Nguyen
Benjamin Muller
Bokai Yu
Marta R. Costa-jussá
Maha Elbayad
...
Itai Gat
Gabriel Synnaeve
Juan Pino
Benoît Sagot
Emmanuel Dupoux
AuLLM
VLM
295
133
0
08 Feb 2024
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR
Liang-Hsuan Tseng
En-Pei Hu
Cheng-Han Chiang
Yuan Tseng
Hung-yi Lee
Lin-shan Lee
Shao-Hua Sun
286
4
0
06 Feb 2024
Learning Semantic Information from Raw Audio Signal Using Both Contextual and Phonetic Representations
Jaeyeon Kim
Injune Hwang
Kyogu Lee
143
0
0
02 Feb 2024
Speech foundation models on intelligibility prediction for hearing-impaired listeners
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Santiago Cuervo
R. Marxer
366
18
0
24 Jan 2024
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
317
2
0
18 Dec 2023
Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training
Sean Robertson
Ewan Dunbar
SSL
272
1
0
03 Dec 2023
Generative Spoken Language Model based on continuous word-sized audio tokens
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Robin Algayres
Yossi Adi
Tu Nguyen
Jade Copet
Gabriel Synnaeve
Benoît Sagot
Emmanuel Dupoux
AuLLM
303
22
0
08 Oct 2023
Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Kuan-Po Huang
Chih-Kai Yang
Yu-Kuan Fu
Ewan Dunbar
Hung-yi Lee
417
14
0
04 Oct 2023
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Chien-yu Huang
Ke-Han Lu
Shi Wang
Chi-Yuan Hsiao
Chun-Yi Kuan
...
Roshan S. Sharma
Shinji Watanabe
Bhiksha Ramakrishnan
Shady Shehata
Hung-yi Lee
AuLLM
418
99
0
18 Sep 2023
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Soumi Maiti
Yifan Peng
Shukjae Choi
Jee-weon Jung
Xuankai Chang
Shinji Watanabe
VLM
AuLLM
448
92
0
14 Sep 2023
Leveraging Pretrained Image-text Models for Improving Audio-Visual Learning
Saurabhchand Bhati
Jesús Villalba
Laureano Moro-Velazquez
Thomas Thebaud
Najim Dehak
CLIP
225
4
0
08 Sep 2023
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
Computer Speech and Language (CSL), 2023
Salah Zaiem
Youcef Kemiche
Titouan Parcollet
S. Essid
Mirco Ravanelli
SSL
288
20
0
28 Aug 2023
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis
Interspeech (Interspeech), 2023
Tu Nguyen
Wei-Ning Hsu
Antony DÁvirro
Bowen Shi
Itai Gat
...
Gabriel Synnaeve
Michael Hassid
Felix Kreuk
Yossi Adi
Emmanuel Dupoux
305
121
0
10 Aug 2023
What Do Self-Supervised Speech Models Know About Words?
Transactions of the Association for Computational Linguistics (TACL), 2023
Ankita Pasad
C. Chien
Shane Settle
Karen Livescu
SSL
624
63
0
30 Jun 2023
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Interspeech (Interspeech), 2023
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
Yusuke Ijima
Taichi Asami
Marc Delcroix
Yukinori Honma
SSL
ELM
296
15
0
14 Jun 2023
Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes
Interspeech (Interspeech), 2023
Kevin Glocker
Aaricia Herygers
Munir Georges
266
13
0
07 Jun 2023
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models
Interspeech (Interspeech), 2023
Marvin Lavechin
Yaya Sy
Hadrien Titeux
María Andrea Cruz Blandón
Okko Räsänen
H. Bredin
Emmanuel Dupoux
Alejandrina Cristià
AuLLM
431
22
0
02 Jun 2023
Zero-Shot Automatic Pronunciation Assessment
Interspeech (Interspeech), 2023
Hongfu Liu
Mingqiang Shi
Ye Wang
264
8
0
31 May 2023
Textually Pretrained Speech Language Models
Neural Information Processing Systems (NeurIPS), 2023
Michael Hassid
Tal Remez
Tu Nguyen
Itai Gat
Alexis Conneau
...
Alexandre Défossez
Gabriel Synnaeve
Emmanuel Dupoux
Roy Schwartz
Yossi Adi
VLM
SyDa
568
103
0
22 May 2023
Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces
Interspeech (Interspeech), 2023
Oli Danyi Liu
Hao Tang
Sharon Goldwater
SSL
239
19
0
21 May 2023
Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering
Interspeech (Interspeech), 2023
Heng-Jui Chang
Alexander H. Liu
James R. Glass
SSL
267
31
0
18 May 2023
1
2
Next
Page 1 of 2