v1v2 (latest)

The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling

23 November 2020

Papers citing "The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling"

50 / 87 papers shown

Latent Speech-Text Transformer

...

182

07 Oct 2025

LongTail-Swap: benchmarking language models' abilities on rare words

Robin Algayres

Charles-Éric Saint-James

145

05 Oct 2025

Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models

María Andrea Cruz Blandón

203

22 Sep 2025

Llama-Mimi: Exploring the Limits of Flattened Speech Language Modeling

Issa Sugiura

Shuhei Kurita

Yusuke Oda

Ryuichiro Higashinaka

AuLLM

187

18 Sep 2025

An Empirical Analysis of Discrete Unit Representations in Speech Language Modeling Pre-trainingInternational Conference on Text, Speech and Dialogue (TSD), 2025

Yanis Labrak

Richard Dufour

Mickael Rouvier

126

03 Sep 2025

Representing Speech Through Autoregressive Prediction of Cochlear Tokens

197

15 Aug 2025

Flow-SLM: Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling

Ju-Chieh Chou

Jiawei Zhou

Karen Livescu

296

12 Aug 2025

Pitch Accent Detection improves Pretrained Automatic Speech Recognition

David Sasu

Natalie Schluter

06 Aug 2025

A Variational Framework for Improving Naturalness in Generative Spoken Language Models

Li-Wei Chen

Takuya Higuchi

Zakaria Aldeneh

Ahmed Hussen Abdelaziz

Alexander I. Rudnicky

263

17 Jun 2025

AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models

436

05 Jun 2025

fastabx: A library for efficient computation of ABX discriminability

Maxime Poli

Emmanuel Chemla

Emmanuel Dupoux

327

05 May 2025

TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling

570

09 Apr 2025

Late Fusion and Multi-Level Fission Amplify Cross-Modal Transfer in Text-Speech LMs

400

08 Mar 2025

Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI FeedbackAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Guan-Ting Lin

Prashanth Gurunath Shivakumar

437

04 Nov 2024

DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models

364

31 Oct 2024

Sylber: Syllabic Embedding Representation of Speech from Raw AudioInternational Conference on Learning Representations (ICLR), 2024

Cheol Jun Cho

Nicholas Lee

Akshat Gupta

Dhruv Agarwal

Ethan Chen

Alan W Black

Gopala K. Anumanchipalli

332

09 Oct 2024

SyllableLM: Learning Coarse Semantic Units for Speech Language ModelsInternational Conference on Learning Representations (ICLR), 2024

Alan Baade

Puyuan Peng

David Harwath

416

05 Oct 2024

Recent Advances in Speech Language Models: A SurveyAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

747

01 Oct 2024

SSR: Alignment-Aware Modality Connector for Speech Language ModelsInternational Workshop on Spoken Language Translation (IWSLT), 2024

515

30 Sep 2024

Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue AgentsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

361

23 Sep 2024

Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning ApproachConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Maxime Poli

Emmanuel Chemla

Emmanuel Dupoux

263

16 Sep 2024

LAST: Language Model Aware Speech Tokenization

A. Turetzky

Yossi Adi

402

05 Sep 2024

NAST: Noise Aware Speech Tokenization for Speech Language Models

Shoval Messica

Yossi Adi

272

16 Jun 2024

Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations

327

13 Jun 2024

A predictive learning model can simulate temporal dynamics and context effects found in neural representations of continuous speech

Oli Danyi Liu

Hao Tang

Naomi H Feldman

Sharon Goldwater

308

13 May 2024

A Large-Scale Evaluation of Speech Foundation Models

...

Shinji Watanabe

Hung-yi Lee

342

15 Apr 2024

Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling

Injune Hwang

Kyogu Lee

214

01 Apr 2024

Scaling Properties of Speech Language Models

Santiago Cuervo

R. Marxer

328

31 Mar 2024

Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model

Hung-yi Lee

267

08 Feb 2024

SpiRit-LM: Interleaved Spoken and Written Language Model

...

295

133

08 Feb 2024

REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR

Hung-yi Lee

286

06 Feb 2024

Learning Semantic Information from Raw Audio Signal Using Both Contextual and Phonetic Representations

Jaeyeon Kim

Injune Hwang

Kyogu Lee

143

02 Feb 2024

Speech foundation models on intelligibility prediction for hearing-impaired listenersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Santiago Cuervo

R. Marxer

366

24 Jan 2024

Efficiency-oriented approaches for self-supervised speech representation learning

Luis Lugo

Valentin Vielzeuf

SSL

317

18 Dec 2023

Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training

Sean Robertson

Ewan Dunbar

SSL

272

03 Dec 2023

Generative Spoken Language Model based on continuous word-sized audio tokensConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Yossi Adi

303

08 Oct 2023

Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken LanguagesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Kuan-Po Huang

Chih-Kai Yang

Yu-Kuan Fu

Ewan Dunbar

Hung-yi Lee

417

04 Oct 2023

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

...

Hung-yi Lee

418

18 Sep 2023

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasksIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

448

14 Sep 2023

Leveraging Pretrained Image-text Models for Improving Audio-Visual Learning

Saurabhchand Bhati

Jesús Villalba

Laureano Moro-Velazquez

Thomas Thebaud

Najim Dehak

CLIP

225

08 Sep 2023

Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing HeadsComputer Speech and Language (CSL), 2023

Mirco Ravanelli

288

28 Aug 2023

EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech ResynthesisInterspeech (Interspeech), 2023

...

Yossi Adi

305

121

10 Aug 2023

What Do Self-Supervised Speech Models Know About Words?Transactions of the Association for Computational Linguistics (TACL), 2023

624

30 Jun 2023

SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?Interspeech (Interspeech), 2023

296

14 Jun 2023

Allophant: Cross-lingual Phoneme Recognition with Articulatory AttributesInterspeech (Interspeech), 2023

Kevin Glocker

Aaricia Herygers

Munir Georges

266

07 Jun 2023

BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language modelsInterspeech (Interspeech), 2023

Marvin Lavechin

Yaya Sy

Hadrien Titeux

María Andrea Cruz Blandón

431

02 Jun 2023

Zero-Shot Automatic Pronunciation AssessmentInterspeech (Interspeech), 2023

Hongfu Liu

Mingqiang Shi

Ye Wang

264

31 May 2023

Textually Pretrained Speech Language ModelsNeural Information Processing Systems (NeurIPS), 2023

...

Yossi Adi

568

103

22 May 2023

Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal SubspacesInterspeech (Interspeech), 2023

Oli Danyi Liu

Hao Tang

Sharon Goldwater

SSL

239

21 May 2023

Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant ClusteringInterspeech (Interspeech), 2023

267

18 May 2023