SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language
  Models
v1v2 (latest)

SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models

International Conference on Learning Representations (ICLR), 2023

Papers citing "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"

50 / 74 papers shown
Code Drift: Towards Idempotent Neural Audio Codecs
Code Drift: Towards Idempotent Neural Audio CodecsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
925
3
0
14 Oct 2024
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
SyllableLM: Learning Coarse Semantic Units for Speech Language ModelsInternational Conference on Learning Representations (ICLR), 2024
328
21
0
05 Oct 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid EmotionsComputer Vision and Pattern Recognition (CVPR), 2024
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
433
42
0
26 Sep 2024
StyleFusion TTS: Multimodal Style-control and Enhanced Feature Fusion
  for Zero-shot Text-to-speech Synthesis
StyleFusion TTS: Multimodal Style-control and Enhanced Feature Fusion for Zero-shot Text-to-speech SynthesisChinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2024
192
3
0
24 Sep 2024
Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec
  models
Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec modelsSpoken Language Technology Workshop (SLT), 2024
Haibin Wu
Xuanjun Chen
Yi-Cheng Lin
Kaiwei Chang
Jiawei Du
...
Yi-Chiao Wu
Xu Tan
James Glass
Shinji Watanabe
Hung-yi Lee
179
14
0
21 Sep 2024
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec
  Transformer
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec TransformerInternational Conference on Learning Representations (ICLR), 2024
440
148
0
01 Sep 2024