ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.02111
  4. Cited By
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

5 January 2023
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
Shujie Liu
Zhuo Chen
Yanqing Liu
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
ArXivPDFHTML

Papers citing "Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers"

50 / 463 papers shown
Title
vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
Yiwei Guo
Zhihan Li
Junjie Li
Chenpeng Du
Hankun Wang
Shuai Wang
Xie Chen
Kai Yu
27
0
0
03 Sep 2024
VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for
  Taiwanese Hakka
VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka
Li-Wei Chen
Hung-Shin Lee
Chen-Chi Chang
VLM
22
0
0
03 Sep 2024
EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio
  Captioning Performance
EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance
Jaeyeon Kim
Minjeon Jeon
Jaeyoon Jung
Sang Hoon Woo
Jinjoo Lee
23
2
0
02 Sep 2024
Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio
  Captioning
Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning
Jaeyeon Kim
Jaeyoon Jung
Minjeong Jeon
Sang Hoon Woo
Jinjoo Lee
24
1
0
02 Sep 2024
SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient
  Language Model Based Text-to-Speech Synthesis
SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis
Haohan Guo
Fenglong Xie
Kun Xie
Dongchao Yang
Dake Guo
Xixin Wu
Helen Meng
29
4
0
02 Sep 2024
Sample-Efficient Diffusion for Text-To-Speech Synthesis
Sample-Efficient Diffusion for Text-To-Speech Synthesis
Justin Lovelace
Soham Ray
Kwangyoun Kim
Kilian Q. Weinberger
Felix Wu
16
2
0
01 Sep 2024
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec
  Transformer
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Yuancheng Wang
Haoyue Zhan
Liwei Liu
Ruihong Zeng
Haotian Guo
Jiachen Zheng
Qiang Zhang
Shunsi Zhang
Shunsi Zhang
Zhizheng Wu
23
37
0
01 Sep 2024
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio
  Language Model
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Zhen Ye
Peiwen Sun
Jiahe Lei
Hongzhan Lin
Xu Tan
...
Jianyi Chen
Jiahao Pan
Qifeng Liu
Yike Guo
Wei Xue
AuLLM
19
11
0
30 Aug 2024
SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
Ismail Rasim Ulgen
Shreeram Suresh Chandra
Junchen Lu
Berrak Sisman
58
0
0
30 Aug 2024
Enabling Beam Search for Language Model-Based Text-to-Speech Synthesis
Enabling Beam Search for Language Model-Based Text-to-Speech Synthesis
Zehai Tu
Guangyan Zhang
Yiting Lu
Adaeze Adigwe
Simon King
Yiwen Guo
27
0
0
29 Aug 2024
SSDM: Scalable Speech Dysfluency Modeling
SSDM: Scalable Speech Dysfluency Modeling
Jiachen Lian
Xuanru Zhou
Z. Ezzes
Jet M J Vonk
Brittany Morin
D. Baquirin
Zachary Mille
M. G. Tempini
Gopala Anumanchipalli
AuLLM
30
1
0
29 Aug 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
49
32
0
29 Aug 2024
VoxInstruct: Expressive Human Instruction-to-Speech Generation with
  Unified Multilingual Codec Language Modelling
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
Yixuan Zhou
Xiaoyu Qin
Zeyu Jin
Shuoyi Zhou
Shun Lei
Songtao Zhou
Zhiyong Wu
Jia Jia
AuLLM
18
5
0
28 Aug 2024
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion
  of Whispered and Regular Speech
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech
Anastasia Avdeeva
Aleksei Gusev
17
0
0
21 Aug 2024
Does Current Deepfake Audio Detection Model Effectively Detect ALM-based
  Deepfake Audio?
Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
Yuankun Xie
Chenxu Xiong
Xiaopeng Wang
Zhiyong Wang
Yi Lu
...
Yukun Liu
Zhengqi Wen
Jianhua Tao
Guanjun Li
Long Ye
AuLLM
26
1
0
20 Aug 2024
Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision
Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision
Zhijun Jia
Huaying Xue
Xiulian Peng
Yan Lu
16
1
0
19 Aug 2024
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform
  Generation
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
OOD
DiffM
AI4TS
32
5
0
14 Aug 2024
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
Min Ma
Yuma Koizumi
Shigeki Karita
Heiga Zen
Jason Riesa
Haruko Ishikawa
M. Bacchiani
VLM
21
4
0
12 Aug 2024
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for
  Speech Processing
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
Chunyu Qiang
Wang Geng
Yi Zhao
Ruibo Fu
Tao Wang
...
Chen Zhang
Hao Che
Longbiao Wang
Jianwu Dang
Jianhua Tao
AI4TS
31
0
0
11 Aug 2024
Survey: Transformer-based Models in Data Modality Conversion
Survey: Transformer-based Models in Data Modality Conversion
Elyas Rashno
Amir Eskandari
Aman Anand
F. Zulkernine
MedIm
33
0
0
08 Aug 2024
Language Model Can Listen While Speaking
Language Model Can Listen While Speaking
Ziyang Ma
Yakun Song
Chenpeng Du
Jian Cong
Zhuo Chen
Yuping Wang
Y. Wang
Xie Chen
AuLLM
29
23
0
05 Aug 2024
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like
  Spontaneous Representation
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Xinhan Di
Jiahao Lu
Yunming Liang
Junjie Zheng
Yihua Wang
Chaofan Ding
ALM
31
1
0
01 Aug 2024
Generative Expressive Conversational Speech Synthesis
Generative Expressive Conversational Speech Synthesis
Rui Liu
Yifan Hu
Yi Ren
Xiang Yin
Haizhou Li
51
5
0
31 Jul 2024
Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech
  SpeechT5 Model
Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model
Jan Lehecka
Z. Hanzlícek
J. Matousek
Daniel Tihelka
24
0
0
24 Jul 2024
dMel: Speech Tokenization made Simple
dMel: Speech Tokenization made Simple
Richard He Bai
Tatiana Likhomanenko
Ruixiang Zhang
Zijin Gu
Zakaria Aldeneh
Navdeep Jaitly
33
4
0
22 Jul 2024
Generating Sample-Based Musical Instruments Using Neural Audio Codec
  Language Models
Generating Sample-Based Musical Instruments Using Neural Audio Codec Language Models
S. Nercessian
Johannes Imort
Ninon Devis
Frederik Blang
29
1
0
22 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
26
4
0
21 Jul 2024
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous
  Behaviors Based on Language Models
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models
Weiqin Li
Pei-Yin Yang
Yicheng Zhong
Yixuan Zhou
Zhisheng Wang
Zhiyong Wu
Xixin Wu
Helen M. Meng
25
3
0
18 Jul 2024
Laugh Now Cry Later: Controlling Time-Varying Emotional States of
  Flow-Matching-Based Zero-Shot Text-to-Speech
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Haibin Wu
Xiaofei Wang
Sefik Emre Eskimez
Manthan Thakker
Daniel Tompkins
...
Canrun Li
Zhen Xiao
Sheng Zhao
Jinyu Li
Naoyuki Kanda
17
6
0
17 Jul 2024
A Language Modeling Approach to Diacritic-Free Hebrew TTS
A Language Modeling Approach to Diacritic-Free Hebrew TTS
Amit Roth
A. Turetzky
Yossi Adi
27
2
0
16 Jul 2024
LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis
LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis
Zhenxiong Tan
Xinyin Ma
Gongfan Fang
Xinchao Wang
23
3
0
15 Jul 2024
Speech Slytherin: Examining the Performance and Efficiency of Mamba for
  Speech Separation, Recognition, and Synthesis
Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis
Xilin Jiang
Yinghao Aaron Li
Adrian Nicolas Florea
Cong Han
N. Mesgarani
Mamba
38
9
0
13 Jul 2024
Data Adaptive Traceback for Vision-Language Foundation Models in Image
  Classification
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
Wenshuo Peng
Kaipeng Zhang
Yue Yang
Hao Zhang
Yu Qiao
VLM
19
2
0
11 Jul 2024
Live2Diff: Live Stream Translation via Uni-directional Attention in
  Video Diffusion Models
Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models
Zhening Xing
Gereon Fox
Yanhong Zeng
Xingang Pan
Mohamed A. Elgharib
Christian Theobalt
Kai Chen
VGen
25
3
0
11 Jul 2024
Autoregressive Speech Synthesis without Vector Quantization
Autoregressive Speech Synthesis without Vector Quantization
Lingwei Meng
Long Zhou
Shujie Liu
Sanyuan Chen
Bing Han
...
Jinyu Li
Sheng Zhao
Xixin Wu
Helen Meng
Furu Wei
38
30
0
11 Jul 2024
An Unsupervised Domain Adaptation Method for Locating Manipulated Region
  in partially fake Audio
An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio
Siding Zeng
Jiangyan Yi
Jianhua Tao
Yujie Chen
Shan Liang
Yong Ren
Xiaohui Zhang
25
0
0
11 Jul 2024
Several questions of visual generation in 2024
Several questions of visual generation in 2024
Shuyang Gu
22
1
0
11 Jul 2024
Fine-Grained and Interpretable Neural Speech Editing
Fine-Grained and Interpretable Neural Speech Editing
Max Morrison
Cameron Churchwell
Nathan Pruyne
Bryan Pardo
39
3
0
07 Jul 2024
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for
  Text-to-Speech Speaker Adaptation
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
Ruibo Fu
Xin Qi
Zhengqi Wen
Jianhua Tao
Tao Wang
...
Xiaopeng Wang
Shuchen Shi
Yukun Liu
Xuefei Liu
Shuai Zhang
42
0
0
07 Jul 2024
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for
  Large-Scale Speech Generation
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Yuancheng Wang
Kai Chen
Pengyuan Zhang
Zhizheng Wu
25
33
0
07 Jul 2024
FunAudioLLM: Voice Understanding and Generation Foundation Models for
  Natural Interaction Between Humans and LLMs
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Keyu An
Qian Chen
Chong Deng
Zhihao Du
Changfeng Gao
...
Bin Zhang
Qinglin Zhang
Shiliang Zhang
Nan Zhao
Siqi Zheng
AuLLM
27
42
0
04 Jul 2024
On the Effectiveness of Acoustic BPE in Decoder-Only TTS
On the Effectiveness of Acoustic BPE in Decoder-Only TTS
Bohan Li
Feiyu Shen
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
37
2
0
04 Jul 2024
Codec-ASR: Training Performant Automatic Speech Recognition Systems with
  Discrete Speech Representations
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
Kunal Dhawan
Nithin Rao Koluguri
Ante Jukić
Ryan Langman
Jagadeesh Balam
Boris Ginsburg
39
1
0
03 Jul 2024
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference
  Optimization
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Yuchen Hu
Chen Chen
Siyin Wang
Eng Siong Chng
C. Zhang
43
3
0
02 Jul 2024
Accompanied Singing Voice Synthesis with Fully Text-controlled Melody
Accompanied Singing Voice Synthesis with Fully Text-controlled Melody
Ruiqi Li
Zhiqing Hong
Yongqi Wang
Lichao Zhang
Rongjie Huang
Siqi Zheng
Zhou Zhao
31
6
0
02 Jul 2024
Lightweight Zero-shot Text-to-Speech with Mixture of Adapters
Lightweight Zero-shot Text-to-Speech with Mixture of Adapters
Kenichi Fujita
Takanori Ashihara
Marc Delcroix
Yusuke Ijima
25
2
0
01 Jul 2024
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech
  Synthesis
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis
Yinlin Guo
Yening Lv
Jinqiao Dou
Yan Zhang
Yuehai Wang
18
0
0
30 Jun 2024
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Sefik Emre Eskimez
Xiaofei Wang
Manthan Thakker
Canrun Li
Chung-Hsien Tsai
...
Min Tang
Xu Tan
Yanqing Liu
Sheng Zhao
Naoyuki Kanda
VLM
30
46
0
26 Jun 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic
  Alignment
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
37
11
0
25 Jun 2024
Towards Zero-Shot Text-To-Speech for Arabic Dialects
Towards Zero-Shot Text-To-Speech for Arabic Dialects
Khai Duy Doan
Abdul Waheed
Muhammad Abdul-Mageed
35
0
0
24 Jun 2024
Previous
12345...8910
Next