ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.02111
  4. Cited By
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

5 January 2023
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
Shujie Liu
Zhuo Chen
Yanqing Liu
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
ArXivPDFHTML

Papers citing "Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers"

50 / 463 papers shown
Title
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Jun Zhan
Junqi Dai
Jiasheng Ye
Yunhua Zhou
Dong Zhang
...
Jie Fu
Tao Gui
Tianxiang Sun
Yugang Jiang
Xipeng Qiu
MLLM
27
114
0
19 Feb 2024
Language-Codec: Reducing the Gaps Between Discrete Codec Representation
  and Speech Language Models
Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models
Shengpeng Ji
Minghui Fang
Ziyue Jiang
Siqi Zheng
Qian Chen
Rongjie Huang
Jialung Zuo
Shulei Wang
Zhou Zhao
AuLLM
24
14
0
19 Feb 2024
APCodec: A Neural Audio Codec with Parallel Amplitude and Phase Spectrum
  Encoding and Decoding
APCodec: A Neural Audio Codec with Parallel Amplitude and Phase Spectrum Encoding and Decoding
Yang Ai
Xiao-Hang Jiang
Ye-Xin Lu
Hui-Peng Du
Zhenhua Ling
13
20
0
16 Feb 2024
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot
  Text-to-Speech
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
Shengpeng Ji
Ziyue Jiang
Hanting Wang
Jia-li Zuo
Zhou Zhao
24
8
0
14 Feb 2024
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model
  on 100K hours of data
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Mateusz Lajszczak
Guillermo Cámbara
Yang Li
Fatih Beyhan
Arent van Korlaar
...
Bartosz Putrycz
Soledad López Gambino
Kayeon Yoo
Elena Sokolova
Thomas Drugman
LM&MA
33
71
0
12 Feb 2024
Unsupervised Sign Language Translation and Generation
Unsupervised Sign Language Translation and Generation
Zhengsheng Guo
Zhiwei He
Wenxiang Jiao
Xing Wang
Rui Wang
Kehai Chen
Zhaopeng Tu
Yong-mei Xu
Min Zhang
44
0
0
12 Feb 2024
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
Naoyuki Kanda
Xiaofei Wang
Sefik Emre Eskimez
Manthan Thakker
Hemin Yang
...
Yufei Xia
Jinzhu Li
Yanqing Liu
Sheng Zhao
Michael Zeng
19
8
0
12 Feb 2024
SpiRit-LM: Interleaved Spoken and Written Language Model
SpiRit-LM: Interleaved Spoken and Written Language Model
Tu Nguyen
Benjamin Muller
Bokai Yu
Marta R. Costa-jussá
Maha Elbayad
...
Itai Gat
Gabriel Synnaeve
Juan Pino
Benoît Sagot
Emmanuel Dupoux
AuLLM
VLM
44
32
0
08 Feb 2024
Real-World Robot Applications of Foundation Models: A Review
Real-World Robot Applications of Foundation Models: A Review
Kento Kawaharazuka
T. Matsushima
Andrew Gambardella
Jiaxian Guo
Chris Paxton
Andy Zeng
OffRL
VLM
LM&Ro
41
45
0
08 Feb 2024
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu
Jaehong Yoon
Mohit Bansal
77
4
0
08 Feb 2024
MusicRL: Aligning Music Generation to Human Preferences
MusicRL: Aligning Music Generation to Human Preferences
Geoffrey Cideron
Sertan Girgin
Mauro Verzetti
Damien Vincent
Matej Kastelic
...
Olivier Pietquin
Matthieu Geist
Léonard Hussenot
Neil Zeghidour
A. Agostinelli
26
15
0
06 Feb 2024
Enhancing the Stability of LLM-based Speech Generation Systems through
  Self-Supervised Representations
Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations
Álvaro Martín-Cortinas
Daniel Sáez-Trigueros
Iván Vallés-Pérez
Biel Tura Vecino
Piotr Bilinski
Mateusz Lajszczak
Grzegorz Beringer
Roberto Barra-Chicote
Jaime Lorenzo-Trueba
14
5
0
05 Feb 2024
Spin: An Efficient Secure Computation Framework with GPU Acceleration
Spin: An Efficient Secure Computation Framework with GPU Acceleration
Wuxuan Jiang
Xiangjun Song
Shenbai Hong
Haijun Zhang
Wenxin Liu
Bo Zhao
Wei Xu
Yi Li
13
1
0
04 Feb 2024
Natural language guidance of high-fidelity text-to-speech with synthetic
  annotations
Natural language guidance of high-fidelity text-to-speech with synthetic annotations
Daniel Lyth
Simon King
13
35
0
02 Feb 2024
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
Yihan Wu
Soumi Maiti
Yifan Peng
Wangyou Zhang
Chenda Li
Yuyue Wang
Xihua Wang
Shinji Watanabe
Ruihua Song
25
3
0
31 Jan 2024
Proactive Detection of Voice Cloning with Localized Watermarking
Proactive Detection of Voice Cloning with Localized Watermarking
Robin San Roman
Pierre Fernandez
Alexandre Défossez
Teddy Furon
Tuan Tran
Hady ElSahar
35
39
0
30 Jan 2024
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech
  Generation Leveraging NLP Evaluation Metrics
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics
Takaaki Saeki
Soumi Maiti
Shinnosuke Takamichi
Shinji Watanabe
Hiroshi Saruwatari
16
11
0
30 Jan 2024
Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech
  Generators
Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators
Wiebke Hutiri
Orestis Papakyriakopoulos
Alice Xiang
11
15
0
25 Jan 2024
Intelli-Z: Toward Intelligible Zero-Shot TTS
Intelli-Z: Toward Intelligible Zero-Shot TTS
Sunghee Jung
Won Jang
Jaesam Yoon
Bongwan Kim
22
0
0
25 Jan 2024
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
Chenpeng Du
Yiwei Guo
Hankun Wang
Yifan Yang
Zhikang Niu
Shuai Wang
Hui Zhang
Xie Chen
Kai Yu
VLM
11
25
0
25 Jan 2024
SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation
SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation
Dong Zhang
Xin Zhang
Jun Zhan
Shimin Li
Yaqian Zhou
Xipeng Qiu
AuLLM
BDL
40
16
0
24 Jan 2024
StreamVoice: Streamable Context-Aware Language Modeling for Real-time
  Zero-Shot Voice Conversion
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
Zhichao Wang
Yuan-Jui Chen
Xinsheng Wang
Lei Xie
Yuping Wang
13
6
0
19 Jan 2024
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided
  Sequence Reordering
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering
Ya-Zhen Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Xie Chen
AuLLM
13
35
0
14 Jan 2024
Noise-robust zero-shot text-to-speech synthesis conditioned on
  self-supervised speech-representation model with adapters
Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
Kenichi Fujita
Hiroshi Sato
Takanori Ashihara
Hiroki Kanagawa
Marc Delcroix
Takafumi Moriya
Yusuke Ijima
15
8
0
10 Jan 2024
Masked Audio Generation using a Single Non-Autoregressive Transformer
Masked Audio Generation using a Single Non-Autoregressive Transformer
Alon Ziv
Itai Gat
Gaël Le Lan
Tal Remez
Felix Kreuk
Alexandre Défossez
Jade Copet
Gabriel Synnaeve
Yossi Adi
35
36
0
09 Jan 2024
Towards ASR Robust Spoken Language Understanding Through In-Context
  Learning With Word Confusion Networks
Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks
Kevin Everson
Yile Gu
Huck Yang
Prashanth Gurunath Shivakumar
Guan-Ting Lin
...
Shalini Ghosh
Wael Hamza
Hung-yi Lee
Ariya Rastrow
A. Stolcke
6
5
0
05 Jan 2024
Pheme: Efficient and Conversational Speech Generation
Pheme: Efficient and Conversational Speech Generation
Paweł Budzianowski
Taras Sereda
Tomasz Cichy
Ivan Vulić
21
7
0
05 Jan 2024
Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic
  Token Prediction
Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Semin Kim
Joun Yeop Lee
Nam Soo Kim
AI4TS
18
1
0
03 Jan 2024
Efficient Parallel Audio Generation using Group Masked Language Modeling
Efficient Parallel Audio Generation using Group Masked Language Modeling
Myeonghun Jeong
Minchan Kim
Joun Yeop Lee
Nam Soo Kim
14
5
0
02 Jan 2024
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Hong-ping Hao
Long Zhou
Shujie Liu
Jinyu Li
Shujie Hu
Rui Wang
Furu Wei
29
18
0
30 Dec 2023
Improving In-context Learning via Bidirectional Alignment
Improving In-context Learning via Bidirectional Alignment
Chengwei Qin
Wenhan Xia
Fangkai Jiao
Chen Chen
Yuchen Hu
Bosheng Ding
Shafiq R. Joty
35
7
0
28 Dec 2023
Audiobox: Unified Audio Generation with Natural Language Prompts
Audiobox: Unified Audio Generation with Natural Language Prompts
Apoorv Vyas
Bowen Shi
Matt Le
Andros Tjandra
Yi-Chiao Wu
...
Chris Summers
Carleigh Wood
Joshua Lane
Mary Williamson
Wei-Ning Hsu
33
74
0
25 Dec 2023
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis
  Conditioned on Self-supervised Discrete Speech Representations
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
17
20
0
22 Dec 2023
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive
  Text-to-Speech Synthesis
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
Wenhao Guan
Yishuang Li
Tao Li
Hukai Huang
Feng Wang
Jiayan Lin
Lingyan Huang
Lin Li
Q. Hong
23
8
0
17 Dec 2023
T2M-HiFiGPT: Generating High Quality Human Motion from Textual
  Descriptions with Residual Discrete Representations
T2M-HiFiGPT: Generating High Quality Human Motion from Textual Descriptions with Residual Discrete Representations
Congyi Wang
22
4
0
17 Dec 2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
19
25
0
15 Dec 2023
OpenVoice: Versatile Instant Voice Cloning
OpenVoice: Versatile Instant Voice Cloning
Zengyi Qin
Wenliang Zhao
Xumin Yu
Xin Sun
VLM
19
18
0
03 Dec 2023
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using
  Synthetic Data and Transfer learning
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning
Raviraj Joshi
Nikesh Garera
14
0
0
02 Dec 2023
Code-Mixed Text to Speech Synthesis under Low-Resource Constraints
Code-Mixed Text to Speech Synthesis under Low-Resource Constraints
Raviraj Joshi
Nikesh Garera
17
0
0
02 Dec 2023
MoMask: Generative Masked Modeling of 3D Human Motions
MoMask: Generative Masked Modeling of 3D Human Motions
Chuan Guo
Yuxuan Mu
Muhammad Gohar Javed
Sen Wang
Li Cheng
VGen
19
113
0
29 Nov 2023
Guided Flows for Generative Modeling and Decision Making
Guided Flows for Generative Modeling and Decision Making
Qinqing Zheng
Matt Le
Neta Shaul
Y. Lipman
Aditya Grover
Ricky T. Q. Chen
11
35
0
22 Nov 2023
HierSpeech++: Bridging the Gap between Semantic and Acoustic
  Representation of Speech by Hierarchical Variational Inference for Zero-shot
  Speech Synthesis
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Sang-Hoon Lee
Haram Choi
Seung-Bin Kim
Seong-Whan Lee
BDL
17
31
0
21 Nov 2023
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech
  Synthesis
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
Jungil Kong
Junmo Lee
Jeongmin Kim
Beomjeong Kim
Jihoon Park
Dohee Kong
Changheon Lee
Sangjin Kim
18
1
0
20 Nov 2023
Generative De-Quantization for Neural Speech Codec via Latent Diffusion
Generative De-Quantization for Neural Speech Codec via Latent Diffusion
Haici Yang
Inseon Jang
Minje Kim
DiffM
24
6
0
14 Nov 2023
InstrumentGen: Generating Sample-Based Musical Instruments From Text
InstrumentGen: Generating Sample-Based Musical Instruments From Text
S. Nercessian
Johannes Imort
9
2
0
07 Nov 2023
Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic
  Token Prediction
Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Dongjune Lee
N. Kim
AI4TS
17
10
0
06 Nov 2023
Yet Another Generative Model For Room Impulse Response Estimation
Yet Another Generative Model For Room Impulse Response Estimation
Sungho Lee
Hyeong-Seok Choi
Kyogu Lee
23
10
0
05 Nov 2023
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Yuan Gao
Nobuyuki Morioka
Yu Zhang
Nanxin Chen
DiffM
18
27
0
02 Nov 2023
Generative Pre-training for Speech with Flow Matching
Generative Pre-training for Speech with Flow Matching
Alexander H. Liu
Matt Le
Apoorv Vyas
Bowen Shi
Andros Tjandra
Wei-Ning Hsu
19
31
0
25 Oct 2023
Exploring In-Context Learning of Textless Speech Language Model for
  Speech Classification Tasks
Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks
Ming-Hao Hsu
Kai-Wei Chang
Shang-Wen Li
Hung-yi Lee
26
6
0
19 Oct 2023
Previous
123...106789
Next