Self-Supervised Speech Representations are More Phonetic than Semantic

Self-Supervised Speech Representations are More Phonetic than Semantic

12 June 2024

Tomohiko Nakamura

Satoru Fukayama

Shinji Watanabe

Papers citing "Self-Supervised Speech Representations are More Phonetic than Semantic"

14 / 14 papers shown

Title
Text-Speech Language Models with Improved Cross-Modal Transfer by Aligning Abstraction Levels Santiago Cuervo Adel Moumen Yanis Labrak Sameer Khurana Antoine Laurent Mickael Rouvier R. Marxer 72 1 0 08 Mar 2025
Discrete Speech Unit Extraction via Independent Component Analysis Tomohiko Nakamura Kwanghee Choi Keigo Hojo Yoshiaki Bando Satoru Fukayama Shinji Watanabe 43 0 0 11 Jan 2025
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models Heng-Jui Chang Hongyu Gong Changhan Wang James R. Glass Yu-An Chung 26 0 0 31 Oct 2024
Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding Bohan Li Hankun Wang Situo Zhang Yiwei Guo Kai Yu 31 5 0 29 Oct 2024
Do Discrete Self-Supervised Representations of Speech Capture Tone Distinctions? Opeyemi Osakuade Simon King 16 0 0 25 Oct 2024
Improving Semantic Understanding in Speech Language Models via Brain-tuning Omer Moussa Dietrich Klakow Mariya Toneva 29 3 0 11 Oct 2024
Sylber: Syllabic Embedding Representation of Speech from Raw Audio Cheol Jun Cho Nicholas Lee Akshat Gupta Dhruv Agarwal Ethan Chen Alan W Black Gopala K. Anumanchipalli 32 0 0 09 Oct 2024
SyllableLM: Learning Coarse Semantic Units for Speech Language Models Alan Baade Puyuan Peng David F. Harwath 42 3 0 05 Oct 2024
SpeechTaxi: On Multilingual Semantic Speech Classification Lennart Keller Goran Glavaš 26 0 0 10 Sep 2024
Estimating the Completeness of Discrete Speech Units Sung-Lin Yeh Hao Tang 17 1 0 09 Sep 2024
Efficient Training of Self-Supervised Speech Foundation Models on a Compute Budget Andy T. Liu Yi-Cheng Lin Haibin Wu Stefan Winkler Hung-yi Lee 25 1 0 09 Sep 2024
Analyzing Acoustic Word Embeddings from Pre-trained Self-supervised Speech Models Ramon Sanabria Hao Tang Sharon Goldwater SSL 23 18 0 28 Oct 2022
Opening the Black Box of wav2vec Feature Encoder Kwanghee Choi E. Yeo SSL 25 15 0 27 Oct 2022
Phonetic-and-Semantic Embedding of Spoken Words with Applications in Spoken Content Retrieval Yi-Chen Chen Sung-Feng Huang Chia-Hao Shen Hung-yi Lee Lin-Shan Lee 30 37 0 21 Jul 2018