ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.11588
  4. Cited By
The Zero Resource Speech Benchmark 2021: Metrics and baselines for
  unsupervised spoken language modeling
v1v2 (latest)

The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling

23 November 2020
Tu Nguyen
Maureen de Seyssel
Patricia Roze
M. Rivière
Evgeny Kharitonov
Alexei Baevski
Ewan Dunbar
Emmanuel Dupoux
    SSL
ArXiv (abs)PDFHTML

Papers citing "The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling"

50 / 87 papers shown
Latent Speech-Text Transformer
Latent Speech-Text Transformer
Yen-Ju Lu
Yashesh Gaur
Wei Zhou
Benjamin Muller
Jesus Villalba
...
Luke Zettlemoyer
Gargi Ghosh
Mike Lewis
Srinivasan Iyer
Duc Le
VLM
182
5
0
07 Oct 2025
LongTail-Swap: benchmarking language models' abilities on rare words
LongTail-Swap: benchmarking language models' abilities on rare words
Robin Algayres
Charles-Éric Saint-James
Mahi Luthra
Jiayi Shen
Dongyan Lin
Youssef Benchekroun
Rashel Moritz
Juan Pino
Emmanuel Dupoux
145
1
0
05 Oct 2025
Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models
Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models
María Andrea Cruz Blandón
Zakaria Aldeneh
Jie Chi
Maureen de Seyssel
SSL
203
0
0
22 Sep 2025
Llama-Mimi: Exploring the Limits of Flattened Speech Language Modeling
Llama-Mimi: Exploring the Limits of Flattened Speech Language Modeling
Issa Sugiura
Shuhei Kurita
Yusuke Oda
Ryuichiro Higashinaka
AuLLM
187
2
0
18 Sep 2025
An Empirical Analysis of Discrete Unit Representations in Speech Language Modeling Pre-training
An Empirical Analysis of Discrete Unit Representations in Speech Language Modeling Pre-trainingInternational Conference on Text, Speech and Dialogue (TSD), 2025
Yanis Labrak
Richard Dufour
Mickael Rouvier
126
1
0
03 Sep 2025
Representing Speech Through Autoregressive Prediction of Cochlear Tokens
Representing Speech Through Autoregressive Prediction of Cochlear Tokens
Greta Tuckute
Klemen Kotar
Evelina Fedorenko
Daniel L. K. Yamins
197
0
0
15 Aug 2025
Flow-SLM: Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling
Flow-SLM: Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling
Ju-Chieh Chou
Jiawei Zhou
Karen Livescu
296
6
0
12 Aug 2025
Pitch Accent Detection improves Pretrained Automatic Speech Recognition
Pitch Accent Detection improves Pretrained Automatic Speech Recognition
David Sasu
Natalie Schluter
77
0
0
06 Aug 2025
A Variational Framework for Improving Naturalness in Generative Spoken Language Models
A Variational Framework for Improving Naturalness in Generative Spoken Language Models
Li-Wei Chen
Takuya Higuchi
Zakaria Aldeneh
Ahmed Hussen Abdelaziz
Alexander I. Rudnicky
263
2
0
17 Jun 2025
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models
Chih-Kai Yang
Neo Ho
Yi-Jyun Lee
Hung-yi Lee
AuLLM
436
12
0
05 Jun 2025
fastabx: A library for efficient computation of ABX discriminability
fastabx: A library for efficient computation of ABX discriminability
Maxime Poli
Emmanuel Chemla
Emmanuel Dupoux
327
7
0
05 May 2025
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
Liang-Hsuan Tseng
Yi-Chang Chen
Kuan-Yi Lee
Da-shan Shiu
Hung-yi Lee
AuLLM
570
17
0
09 Apr 2025
Late Fusion and Multi-Level Fission Amplify Cross-Modal Transfer in Text-Speech LMs
Late Fusion and Multi-Level Fission Amplify Cross-Modal Transfer in Text-Speech LMs
Santiago Cuervo
Adel Moumen
Yanis Labrak
Sameer Khurana
Antoine Laurent
Mickael Rouvier
Phil Woodland
R. Marxer
400
1
0
08 Mar 2025
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI FeedbackAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Guan-Ting Lin
Prashanth Gurunath Shivakumar
Aditya Gourav
Yile Gu
Ankur Gandhe
Hung-yi Lee
I. Bulyko
437
31
0
04 Nov 2024
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
Heng-Jui Chang
Hongyu Gong
Changhan Wang
James R. Glass
Yu-An Chung
364
6
0
31 Oct 2024
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Sylber: Syllabic Embedding Representation of Speech from Raw AudioInternational Conference on Learning Representations (ICLR), 2024
Cheol Jun Cho
Nicholas Lee
Akshat Gupta
Dhruv Agarwal
Ethan Chen
Alan W Black
Gopala K. Anumanchipalli
332
20
0
09 Oct 2024
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
SyllableLM: Learning Coarse Semantic Units for Speech Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Alan Baade
Puyuan Peng
David Harwath
416
27
0
05 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A SurveyAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
747
93
0
01 Oct 2024
SSR: Alignment-Aware Modality Connector for Speech Language Models
SSR: Alignment-Aware Modality Connector for Speech Language ModelsInternational Workshop on Spoken Language Translation (IWSLT), 2024
Weiting Tan
Hirofumi Inaguma
Ning Dong
Paden Tomasello
Xutai Ma
515
14
0
30 Sep 2024
Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue
  Agents
Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue AgentsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Bandhav Veluri
Benjamin Peloquin
Bokai Yu
Hongyu Gong
Shyamnath Gollakota
AuLLMOffRL
361
53
0
23 Sep 2024
Improving Spoken Language Modeling with Phoneme Classification: A Simple
  Fine-tuning Approach
Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning ApproachConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Maxime Poli
Emmanuel Chemla
Emmanuel Dupoux
263
6
0
16 Sep 2024
LAST: Language Model Aware Speech Tokenization
LAST: Language Model Aware Speech Tokenization
A. Turetzky
Yossi Adi
402
9
0
05 Sep 2024
NAST: Noise Aware Speech Tokenization for Speech Language Models
NAST: Noise Aware Speech Tokenization for Speech Language Models
Shoval Messica
Yossi Adi
272
13
0
16 Jun 2024
Orthogonality and isotropy of speaker and phonetic information in
  self-supervised speech representations
Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations
Mukhtar Mohamed
Oli Danyi Liu
Hao Tang
Sharon Goldwater
SSL
327
10
0
13 Jun 2024
A predictive learning model can simulate temporal dynamics and context
  effects found in neural representations of continuous speech
A predictive learning model can simulate temporal dynamics and context effects found in neural representations of continuous speech
Oli Danyi Liu
Hao Tang
Naomi H Feldman
Sharon Goldwater
308
3
0
13 May 2024
A Large-Scale Evaluation of Speech Foundation Models
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
342
64
0
15 Apr 2024
Removing Speaker Information from Speech Representation using
  Variable-Length Soft Pooling
Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling
Injune Hwang
Kyogu Lee
214
1
0
01 Apr 2024
Scaling Properties of Speech Language Models
Scaling Properties of Speech Language Models
Santiago Cuervo
R. Marxer
328
26
0
31 Mar 2024
Integrating Self-supervised Speech Model with Pseudo Word-level Targets
  from Visually-grounded Speech Model
Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model
Hung-Chieh Fang
Nai-Xuan Ye
Yi-Jen Shih
Puyuan Peng
Hsuan-Fu Wang
Layne Berry
Hung-yi Lee
David Harwath
VLM
267
1
0
08 Feb 2024
SpiRit-LM: Interleaved Spoken and Written Language Model
SpiRit-LM: Interleaved Spoken and Written Language Model
Tu Nguyen
Benjamin Muller
Bokai Yu
Marta R. Costa-jussá
Maha Elbayad
...
Itai Gat
Gabriel Synnaeve
Juan Pino
Benoît Sagot
Emmanuel Dupoux
AuLLMVLM
295
133
0
08 Feb 2024
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative
  Training for Unsupervised ASR
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR
Liang-Hsuan Tseng
En-Pei Hu
Cheng-Han Chiang
Yuan Tseng
Hung-yi Lee
Lin-shan Lee
Shao-Hua Sun
286
4
0
06 Feb 2024
Learning Semantic Information from Raw Audio Signal Using Both
  Contextual and Phonetic Representations
Learning Semantic Information from Raw Audio Signal Using Both Contextual and Phonetic Representations
Jaeyeon Kim
Injune Hwang
Kyogu Lee
143
0
0
02 Feb 2024
Speech foundation models on intelligibility prediction for
  hearing-impaired listeners
Speech foundation models on intelligibility prediction for hearing-impaired listenersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Santiago Cuervo
R. Marxer
366
18
0
24 Jan 2024
Efficiency-oriented approaches for self-supervised speech representation
  learning
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
317
2
0
18 Dec 2023
Bigger is not Always Better: The Effect of Context Size on Speech
  Pre-Training
Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training
Sean Robertson
Ewan Dunbar
SSL
272
1
0
03 Dec 2023
Generative Spoken Language Model based on continuous word-sized audio
  tokens
Generative Spoken Language Model based on continuous word-sized audio tokensConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Robin Algayres
Yossi Adi
Tu Nguyen
Jade Copet
Gabriel Synnaeve
Benoît Sagot
Emmanuel Dupoux
AuLLM
303
22
0
08 Oct 2023
Zero Resource Code-switched Speech Benchmark Using Speech Utterance
  Pairs For Multiple Spoken Languages
Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken LanguagesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Kuan-Po Huang
Chih-Kai Yang
Yu-Kuan Fu
Ewan Dunbar
Hung-yi Lee
417
14
0
04 Oct 2023
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive
  Instruction-Tuning Benchmark for Speech
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Chien-yu Huang
Ke-Han Lu
Shi Wang
Chi-Yuan Hsiao
Chun-Yi Kuan
...
Roshan S. Sharma
Shinji Watanabe
Bhiksha Ramakrishnan
Shady Shehata
Hung-yi Lee
AuLLM
418
99
0
18 Sep 2023
Voxtlm: unified decoder-only models for consolidating speech
  recognition/synthesis and speech/text continuation tasks
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasksIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Soumi Maiti
Yifan Peng
Shukjae Choi
Jee-weon Jung
Xuankai Chang
Shinji Watanabe
VLMAuLLM
448
92
0
14 Sep 2023
Leveraging Pretrained Image-text Models for Improving Audio-Visual
  Learning
Leveraging Pretrained Image-text Models for Improving Audio-Visual Learning
Saurabhchand Bhati
Jesús Villalba
Laureano Moro-Velazquez
Thomas Thebaud
Najim Dehak
CLIP
225
4
0
08 Sep 2023
Speech Self-Supervised Representations Benchmarking: a Case for Larger
  Probing Heads
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing HeadsComputer Speech and Language (CSL), 2023
Salah Zaiem
Youcef Kemiche
Titouan Parcollet
S. Essid
Mirco Ravanelli
SSL
288
20
0
28 Aug 2023
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech
  Resynthesis
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech ResynthesisInterspeech (Interspeech), 2023
Tu Nguyen
Wei-Ning Hsu
Antony DÁvirro
Bowen Shi
Itai Gat
...
Gabriel Synnaeve
Michael Hassid
Felix Kreuk
Yossi Adi
Emmanuel Dupoux
305
121
0
10 Aug 2023
What Do Self-Supervised Speech Models Know About Words?
What Do Self-Supervised Speech Models Know About Words?Transactions of the Association for Computational Linguistics (TACL), 2023
Ankita Pasad
C. Chien
Shane Settle
Karen Livescu
SSL
624
63
0
30 Jun 2023
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture
  Linguistic Knowledge?
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?Interspeech (Interspeech), 2023
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
Yusuke Ijima
Taichi Asami
Marc Delcroix
Yukinori Honma
SSLELM
296
15
0
14 Jun 2023
Allophant: Cross-lingual Phoneme Recognition with Articulatory
  Attributes
Allophant: Cross-lingual Phoneme Recognition with Articulatory AttributesInterspeech (Interspeech), 2023
Kevin Glocker
Aaricia Herygers
Munir Georges
266
13
0
07 Jun 2023
BabySLM: language-acquisition-friendly benchmark of self-supervised
  spoken language models
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language modelsInterspeech (Interspeech), 2023
Marvin Lavechin
Yaya Sy
Hadrien Titeux
María Andrea Cruz Blandón
Okko Räsänen
H. Bredin
Emmanuel Dupoux
Alejandrina Cristià
AuLLM
431
22
0
02 Jun 2023
Zero-Shot Automatic Pronunciation Assessment
Zero-Shot Automatic Pronunciation AssessmentInterspeech (Interspeech), 2023
Hongfu Liu
Mingqiang Shi
Ye Wang
264
8
0
31 May 2023
Textually Pretrained Speech Language Models
Textually Pretrained Speech Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Michael Hassid
Tal Remez
Tu Nguyen
Itai Gat
Alexis Conneau
...
Alexandre Défossez
Gabriel Synnaeve
Emmanuel Dupoux
Roy Schwartz
Yossi Adi
VLMSyDa
568
103
0
22 May 2023
Self-supervised Predictive Coding Models Encode Speaker and Phonetic
  Information in Orthogonal Subspaces
Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal SubspacesInterspeech (Interspeech), 2023
Oli Danyi Liu
Hao Tang
Sharon Goldwater
SSL
239
19
0
21 May 2023
Self-supervised Fine-tuning for Improved Content Representations by
  Speaker-invariant Clustering
Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant ClusteringInterspeech (Interspeech), 2023
Heng-Jui Chang
Alexander H. Liu
James R. Glass
SSL
267
31
0
18 May 2023
12
Next
Page 1 of 2