ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.02111
  4. Cited By
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

5 January 2023
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
Shujie Liu
Zhuo Chen
Yanqing Liu
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
ArXivPDFHTML

Papers citing "Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers"

50 / 463 papers shown
Title
Character-LLM: A Trainable Agent for Role-Playing
Character-LLM: A Trainable Agent for Role-Playing
Yunfan Shao
Linyang Li
Junqi Dai
Xipeng Qiu
LLMAG
17
208
0
16 Oct 2023
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
Cheol Jun Cho
Abdelrahman Mohamed
Shang-Wen Li
Alan W. Black
Gopala K. Anumanchipalli
18
8
0
16 Oct 2023
SALM: Speech-augmented Language Model with In-context Learning for
  Speech Recognition and Translation
SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
Zhehuai Chen
He Huang
A. Andrusenko
Oleksii Hrinchuk
Krishna C. Puvvada
Jason Chun Lok Li
Subhankar Ghosh
Jagadeesh Balam
Boris Ginsburg
LRM
21
48
0
13 Oct 2023
Low-latency Speech Enhancement via Speech Token Generation
Low-latency Speech Enhancement via Speech Token Generation
Huaying Xue
Xiulian Peng
Yan Lu
14
0
0
13 Oct 2023
Toward Joint Language Modeling for Speech Units and Text
Toward Joint Language Modeling for Speech Units and Text
Ju-Chieh Chou
Chung-Ming Chien
Wei-Ning Hsu
Karen Livescu
Arun Babu
Alexis Conneau
Alexei Baevski
Michael Auli
VLM
18
19
0
12 Oct 2023
Vec-Tok Speech: speech vectorization and tokenization for neural speech
  generation
Vec-Tok Speech: speech vectorization and tokenization for neural speech generation
Xinfa Zhu
Yuanjun Lv
Yinjiao Lei
Tao Li
Wendi He
Hongbin Zhou
Heng Lu
Lei Xie
29
16
0
11 Oct 2023
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Chung-Ming Chien
Mingjiamei Zhang
Ju-Chieh Chou
Karen Livescu
26
3
0
09 Oct 2023
Comparative Analysis of Transfer Learning in Deep Learning
  Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset
Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset
Ze Liu
17
0
0
08 Oct 2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Zhihao Du
Jiaming Wang
Qian Chen
Yunfei Chu
Zhifu Gao
...
Wen Wang
Siqi Zheng
Chang Zhou
Zhijie Yan
Shiliang Zhang
LLMAG
VLM
AuLLM
LM&MA
31
79
0
07 Oct 2023
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling
  for Zero-Shot Voice Cloning
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Tao Li
Zhichao Wang
Xinfa Zhu
Jian Cong
Qiao Tian
Yuping Wang
Lei Xie
DiffM
23
3
0
06 Oct 2023
DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform
  Generation
DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation
Roi Benita
Michael Elad
Joseph Keshet
DiffM
17
7
0
02 Oct 2023
Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Dareen Alharthi
Roshan S. Sharma
Hira Dhamyal
Soumi Maiti
Bhiksha Raj
Rita Singh
11
4
0
01 Oct 2023
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Dongchao Yang
Jinchuan Tian
Xuejiao Tan
Rongjie Huang
Songxiang Liu
...
Jiang Bian
Xixin Wu
Zhou Zhao
Shinji Watanabe
Helen M. Meng
CVBM
AuLLM
20
114
0
01 Oct 2023
Knowledge Engineering using Large Language Models
Knowledge Engineering using Large Language Models
Bradley Paul Allen
Lise Stork
Paul T. Groth
13
24
0
01 Oct 2023
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model
  Adaptation
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
Guy Yariv
Itai Gat
Sagie Benaim
Lior Wolf
Idan Schwartz
Yossi Adi
DiffM
VGen
29
36
0
28 Sep 2023
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard
  Parameter Sharing
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
B. Grimstad
Xuankai Chang
Antonios Anastasopoulos
Yuya Fujita
Shinji Watanabe
8
2
0
27 Sep 2023
Exploring Speech Recognition, Translation, and Understanding with
  Discrete Speech Units: A Comparative Study
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Xuankai Chang
Brian Yan
Kwanghee Choi
Jee-weon Jung
Yichen Lu
...
Pengcheng Guo
Yao-Fei Cheng
Pavel Denisov
Kohei Saijo
Hsiu-Hsuan Wang
26
36
0
27 Sep 2023
High-Fidelity Speech Synthesis with Minimal Supervision: All Using
  Diffusion Models
High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models
Chunyu Qiang
Hao Li
Yixin Tian
Yi Zhao
Ying Zhang
Longbiao Wang
Jianwu Dang
DiffM
20
2
0
27 Sep 2023
Self-Recovery Prompting: Promptable General Purpose Service Robot System
  with Foundation Models and Self-Recovery
Self-Recovery Prompting: Promptable General Purpose Service Robot System with Foundation Models and Self-Recovery
Mimo Shirasaka
T. Matsushima
S. Tsunashima
Yuya Ikeda
Aoi Horo
...
Chikaha Tsuji
Hikaru Wada
Tsunekazu Omija
Dai Komukai
Yusuke Iwasawa
LRM
11
10
0
25 Sep 2023
Towards General-Purpose Text-Instruction-Guided Voice Conversion
Towards General-Purpose Text-Instruction-Guided Voice Conversion
Chun-Yi Kuan
Chen An Li
Tsung-Yuan Hsu
T. Lin
Ho-Lam Chung
Kai-Wei Chang
Shuo-yiin Chang
Hung-yi Lee
13
5
0
25 Sep 2023
Rethinking Internet Communication Through LLMs: How Close Are We?
Rethinking Internet Communication Through LLMs: How Close Are We?
Sifat Ut Taki
Spyridon Mastorakis
14
0
0
25 Sep 2023
Speaker anonymization using neural audio codec language models
Speaker anonymization using neural audio codec language models
Michele Panariello
Francesco Nespoli
Massimiliano Todisco
Nicholas W. D. Evans
4
15
0
25 Sep 2023
AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech
  Data
AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data
Jianwei Yu
Hangting Chen
Yanyao Bian
Xiang Li
Yimin Luo
Jinchuan Tian
Mengyang Liu
Jiayi Jiang
Shuai Wang
VLM
11
12
0
25 Sep 2023
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with
  Multi-Scale Acoustic Prompts
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
Shunwei Lei
Yixuan Zhou
Liyang Chen
Dan Luo
Zhiyong Wu
...
Shiyin Kang
Tao Jiang
Yahui Zhou
Yuxing Han
Helen M. Meng
VLM
23
2
0
21 Sep 2023
Speak While You Think: Streaming Speech Synthesis During Text Generation
Speak While You Think: Streaming Speech Synthesis During Text Generation
Avihu Dekel
Slava Shechtman
Raul Fernandez
David Haws
Zvi Kons
R. Hoory
11
8
0
20 Sep 2023
Discrete Audio Representation as an Alternative to Mel-Spectrograms for
  Speaker and Speech Recognition
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition
Krishna C. Puvvada
Nithin Rao Koluguri
Kunal Dhawan
Jagadeesh Balam
Boris Ginsburg
11
12
0
19 Sep 2023
Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained
  Generative Methods for Speech Enhancement in Adverse Conditions
Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions
Heming Wang
Meng Yu
H. M. Zhang
Chunlei Zhang
Zhongweiyang Xu
Muqiao Yang
Yixuan Zhang
Dong Yu
18
3
0
16 Sep 2023
Stack-and-Delay: a new codebook pattern for music generation
Stack-and-Delay: a new codebook pattern for music generation
Gaël Le Lan
Varun K. Nagaraja
Ernie Chang
David Kant
Zhaoheng Ni
Yangyang Shi
Forrest N. Iandola
Vikas Chandra
BDL
38
7
0
15 Sep 2023
Fewer-token Neural Speech Codec with Time-invariant Codes
Fewer-token Neural Speech Codec with Time-invariant Codes
Yong Ren
Tao Wang
Jiangyan Yi
Le Xu
Jianhua Tao
Chuyuan Zhang
Jun Zhou
14
32
0
15 Sep 2023
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
Yongqiang Wang
Jionghao Bai
Rongjie Huang
Ruiqi Li
Zhiqing Hong
Zhou Zhao
17
3
0
14 Sep 2023
Towards Artificial General Intelligence (AGI) in the Internet of Things
  (IoT): Opportunities and Challenges
Towards Artificial General Intelligence (AGI) in the Internet of Things (IoT): Opportunities and Challenges
Fei Dou
Jin Ye
Geng Yuan
Qin Lu
Wei Niu
...
Hongyue Sun
Yunli Shao
Changying Li
Tianming Liu
Wenzhan Song
AI4CE
13
28
0
14 Sep 2023
SpatialCodec: Neural Spatial Speech Coding
SpatialCodec: Neural Spatial Speech Coding
Zhongweiyang Xu
Yong-mei Xu
Vinay Kothapally
Heming Wang
Muqiao Yang
Dong Yu
21
1
0
14 Sep 2023
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit
  for Neural Speech Codec
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Zhihao Du
Shiliang Zhang
Kai Hu
Siqi Zheng
13
54
0
14 Sep 2023
Voxtlm: unified decoder-only models for consolidating speech
  recognition/synthesis and speech/text continuation tasks
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Soumi Maiti
Yifan Peng
Shukjae Choi
Jee-weon Jung
Xuankai Chang
Shinji Watanabe
VLM
AuLLM
16
54
0
14 Sep 2023
Cross-Utterance Conditioned VAE for Speech Generation
Cross-Utterance Conditioned VAE for Speech Generation
Y. Li
Cheng Yu
Guangzhi Sun
Weiqin Zu
Zheng Tian
...
Wei Pan
Chao Zhang
Jun Wang
Yang Yang
Fanglei Sun
11
2
0
08 Sep 2023
An Efficient Temporary Deepfake Location Approach Based Embeddings for
  Partially Spoofed Audio Detection
An Efficient Temporary Deepfake Location Approach Based Embeddings for Partially Spoofed Audio Detection
Yuankun Xie
Haonan Cheng
Yutian Wang
Long Ye
14
6
0
06 Sep 2023
MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge
  2023
MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023
Zhihang Xu
Shaofei Zhang
Xi Wang
Jiajun Zhang
Wenning Wei
Lei He
Sheng Zhao
11
2
0
06 Sep 2023
PromptTTS 2: Describing and Generating Voices with Text Prompt
PromptTTS 2: Describing and Generating Voices with Text Prompt
Yichong Leng
Zhifang Guo
Kai Shen
Xu Tan
Zeqian Ju
...
Lei He
Xiang-Yang Li
Sheng Zhao
Tao Qin
Jiang Bian
VLM
DiffM
29
40
0
05 Sep 2023
RepCodec: A Speech Representation Codec for Speech Tokenization
RepCodec: A Speech Representation Codec for Speech Tokenization
Zhichao Huang
Chutong Meng
Tom Ko
4
22
0
31 Aug 2023
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via
  Vector-Quantized Self-Supervised Speech Representation Learning
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
Haohan Guo
Fenglong Xie
Jiawen Kang
Yujia Xiao
Xixin Wu
Helen M. Meng
30
3
0
31 Aug 2023
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language
  Models
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
Xin Zhang
Dong Zhang
Shimin Li
Yaqian Zhou
Xipeng Qiu
25
61
0
31 Aug 2023
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language
  Text-to-Speech Models
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
Shengpeng Ji
Jia-li Zuo
Minghui Fang
Ziyue Jiang
Feiyang Chen
Xinyu Duan
Baoxing Huai
Zhou Zhao
22
34
0
28 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Erik Cambria
Björn W. Schuller
LM&MA
AuLLM
29
36
0
24 Aug 2023
WavMark: Watermarking for Audio Generation
WavMark: Watermarking for Audio Generation
Guang Chen
Yu-Huan Wu
Shujie Liu
Tao Liu
Xiaoyong Du
Furu Wei
9
30
0
24 Aug 2023
AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Zhaohui Li
Haitao Wang
Xinghua Jiang
24
1
0
14 Aug 2023
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Xiaofei Wang
Manthan Thakker
Zhuo Chen
Naoyuki Kanda
Sefik Emre Eskimez
Sanyuan Chen
M. Tang
Shujie Liu
Jinyu Li
Takuya Yoshioka
13
79
0
14 Aug 2023
Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic
  Talking-head Generation
Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation
Zhichao Wang
M. Dai
Keld Lundgaard
VGen
DiffM
28
2
0
12 Aug 2023
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech
  Resynthesis
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis
Tu Nguyen
Wei-Ning Hsu
Antony DÁvirro
Bowen Shi
Itai Gat
...
Gabriel Synnaeve
Michael Hassid
Felix Kreuk
Yossi Adi
Emmanuel Dupoux
24
53
0
10 Aug 2023
MSAC: Multiple Speech Attribute Control Method for Reliable Speech
  Emotion Recognition
MSAC: Multiple Speech Attribute Control Method for Reliable Speech Emotion Recognition
Y. Pan
Yuguang Yang
Yuheng Huang
Jixun Yao
Jingjing Yin
Yanni Hu
Heng Lu
Lei Ma
Jianjun Zhao
20
5
0
08 Aug 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text
  Representation Learning with Unit-to-Unit Translation
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Minsu Kim
J. Choi
Dahun Kim
Y. Ro
27
10
0
03 Aug 2023
Previous
123...10789
Next