Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.02111
Cited By
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
5 January 2023
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
Shujie Liu
Zhuo Chen
Yanqing Liu
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers"
50 / 463 papers shown
Title
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion
Robin San Roman
Yossi Adi
Antoine Deleforge
Romain Serizel
Gabriel Synnaeve
Alexandre Défossez
DiffM
11
22
0
02 Aug 2023
MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text
Junchen Zhu
Huan Yang
Wenjing Wang
Huiguo He
Zixi Tuo
...
Wen-Huang Cheng
Lianli Gao
Jingkuan Song
Jianlong Fu
Jiebo Luo
DiffM
20
6
0
31 Jul 2023
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding
Chunyu Qiang
Hao Li
Hao Ni
He Qu
Ruibo Fu
Tao Wang
Longbiao Wang
J. Dang
DiffM
17
8
0
28 Jul 2023
SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
Daegyeom Kim
Seong-soo Hong
Yong-Hoon Choi
12
2
0
20 Jul 2023
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Ziyue Jiang
Jinglin Liu
Yi Ren
Jinzheng He
Zhe Ye
...
Pengfei Wei
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
20
41
0
14 Jul 2023
VampNet: Music Generation via Masked Acoustic Token Modeling
Hugo Flores Garcia
Prem Seetharaman
Rithesh Kumar
Bryan Pardo
MGen
26
64
0
10 Jul 2023
On decoder-only architecture for speech-to-text and large language model integration
Jian Wu
Yashesh Gaur
Zhuo Chen
Long Zhou
Yilun Zhu
...
Jinyu Li
Shujie Liu
Bo Ren
Linquan Liu
Yu-Huan Wu
AuLLM
22
115
0
08 Jul 2023
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
Le Zhuo
Ruibin Yuan
Jiahao Pan
Yi Ma
Yizhi Li
...
Chenghua Lin
Emmanouil Benetos
Wenhu Chen
Wei Xue
Yi-Ting Guo
20
15
0
29 Jun 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
14
263
0
23 Jun 2023
AudioPaLM: A Large Language Model That Can Speak and Listen
Paul Kishan Rubenstein
Chulayuth Asawaroengchai
D. Nguyen
Ankur Bapna
Zalan Borsos
...
Neil Zeghidour
Yu Zhang
Zhishuai Zhang
Lukás Zilka
Christian Frank
LM&MA
AuLLM
VLM
35
256
0
22 Jun 2023
LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
Zhichao Wang
Yuan-Jui Chen
Linfu Xie
Qiao Tian
Yuping Wang
41
30
0
18 Jun 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
VLM
DiffM
21
107
0
13 Jun 2023
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
Chenpeng Du
Yiwei Guo
Feiyu Shen
Zhijun Liu
Zheng Liang
Xie Chen
Shuai Wang
Hui Zhang
K. Yu
DiffM
8
41
0
13 Jun 2023
High-Fidelity Audio Compression with Improved RVQGAN
Rithesh Kumar
Prem Seetharaman
Alejandro Luebs
I. Kumar
Kundan Kumar
8
282
0
11 Jun 2023
Simple and Controllable Music Generation
Jade Copet
Felix Kreuk
Itai Gat
Tal Remez
David Kant
Gabriel Synnaeve
Yossi Adi
Alexandre Défossez
MGen
19
337
0
08 Jun 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Ziyue Jiang
Yi Ren
Zhe Ye
Jinglin Liu
Chen Zhang
...
Rongjie Huang
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
DiffM
21
72
0
06 Jun 2023
Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis
Zhe Ye
Ziyue Jiang
Yi Ren
Jinglin Liu
Chen Zhang
Xiang Yin
Zejun Ma
Zhou Zhao
27
4
0
06 Jun 2023
Phase perturbation improves channel robustness for speech spoofing countermeasures
Yongyi Zang
You Zhang
Z. Duan
11
2
0
06 Jun 2023
PolyVoice: Language Models for Speech to Speech Translation
Qianqian Dong
Zhiying Huang
Qiao Tian
Chen Xu
Tom Ko
...
Lu Lu
Zejun Ma
Yuping Wang
Mingxuan Wang
Yuxuan Wang
20
22
0
05 Jun 2023
SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts
Haibin Wu
Kai-Wei Chang
Yuan-Kuei Wu
Hung-yi Lee
14
22
0
03 Jun 2023
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Yizhi Li
Ruibin Yuan
Ge Zhang
Yi Ma
Xingran Chen
...
Yemin Shi
Wen-Fen Huang
Zili Wang
Yi-Ting Guo
Jie Fu
20
104
0
31 May 2023
Text-to-Speech Pipeline for Swiss German -- A comparison
Tobias Bollinger
Jan Deriu
Manfred Vogel
DiffM
8
0
0
31 May 2023
DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer
Yerin Choi
M. Koo
17
0
0
31 May 2023
Make-A-Voice: Unified Voice Synthesis With Discrete Representation
Rongjie Huang
Chunlei Zhang
Yongqiang Wang
Dongchao Yang
Lu Liu
Zhenhui Ye
Ziyue Jiang
Chao Weng
Zhou Zhao
Dong Yu
DiffM
21
26
0
30 May 2023
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
M. Bacchiani
Yu Zhang
Wei Han
Ankur Bapna
30
65
0
30 May 2023
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning
Xuankai Chang
Brian Yan
Yuya Fujita
Takashi Maekaku
Shinji Watanabe
14
37
0
29 May 2023
VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Tianrui Wang
Long Zhou
Zi-Hua Zhang
Yu-Huan Wu
Shujie Liu
Yashesh Gaur
Zhuo Chen
Jinyu Li
Furu Wei
32
100
0
25 May 2023
Efficient Neural Music Generation
Max W. Y. Lam
Qiao Tian
Tang-Chun Li
Zongyu Yin
Siyuan Feng
...
Mingbo Ma
Xuchen Song
Jitong Chen
Yuping Wang
Yuxuan Wang
DiffM
MGen
21
49
0
25 May 2023
LAraBench: Benchmarking Arabic AI with Large Language Models
Ahmed Abdelali
Hamdy Mubarak
Shammur A. Chowdhury
Maram Hasanain
Basel Mousi
...
Yousseif Elshahawy
Ahmed M. Ali
Nadir Durrani
Natasa Milic-Frayling
Firoj Alam
ELM
LM&MA
13
18
0
24 May 2023
Vision + Language Applications: A Survey
Yutong Zhou
N. Shimada
VLM
18
5
0
24 May 2023
ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models
Minki Kang
Wooseok Han
S. Hwang
Eunho Yang
DiffM
15
15
0
23 May 2023
i-Code Studio: A Configurable and Composable Framework for Integrative AI
Yuwei Fang
Mahmoud Khademi
Chenguang Zhu
Ziyi Yang
Reid Pryzant
...
Yao Qian
Takuya Yoshioka
Lu Yuan
Michael Zeng
Xuedong Huang
25
2
0
23 May 2023
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
Xin Jing
Yi Chang
Zijiang Yang
Jiang-jian Xie
Andreas Triantafyllopoulos
Bjoern W. Schuller
18
10
0
22 May 2023
AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation
Guy Yariv
Itai Gat
Lior Wolf
Yossi Adi
Idan Schwartz
DiffM
20
20
0
22 May 2023
CARD: Channel Aligned Robust Blend Transformer for Time Series Forecasting
Xue Wang
Tian Zhou
Qingsong Wen
Jinyang Gao
Bolin Ding
Rong Jin
AI4TS
10
35
0
20 May 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Hiroki Furuta
Kuang-Huei Lee
Ofir Nachum
Yutaka Matsuo
Aleksandra Faust
S. Gu
Izzeddin Gur
LM&Ro
14
90
0
19 May 2023
Data Redaction from Conditional Generative Models
Zhifeng Kong
Kamalika Chaudhuri
KELM
16
7
0
18 May 2023
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
Dong Zhang
Shimin Li
Xin Zhang
Jun Zhan
Pengyu Wang
Yaqian Zhou
Xipeng Qiu
AuLLM
MLLM
43
285
0
18 May 2023
Exploring the Space of Key-Value-Query Models with Intention
M. Garnelo
Wojciech M. Czarnecki
16
7
0
17 May 2023
SoundStorm: Efficient Parallel Audio Generation
Zalan Borsos
Matthew Sharifi
Damien Vincent
Eugene Kharitonov
Neil Zeghidour
Marco Tagliasacchi
15
97
0
16 May 2023
Integrating Generative Artificial Intelligence in Intelligent Vehicle Systems
Lukas Stappen
J. Dillmann
S. Striegel
Hans-Jörg Vögel
Nicolas Flores-Herr
Björn W. Schuller
11
9
0
15 May 2023
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec
Dongchao Yang
Songxiang Liu
Rongjie Huang
Jinchuan Tian
Chao Weng
Yuexian Zou
138
118
0
04 May 2023
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Kai Shen
Zeqian Ju
Xu Tan
Yanqing Liu
Yichong Leng
Lei He
Tao Qin
Sheng Zhao
Jiang Bian
DiffM
10
219
0
18 Apr 2023
DSVAE: Interpretable Disentangled Representation for Synthetic Speech Detection
Amit Kumar Singh Yadav
Kratika Bhagtani
Ziyue Xiang
Paolo Bestagini
Stefano Tubaro
Edward J. Delp
DRL
19
6
0
06 Apr 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
33
46
0
21 Mar 2023
Susceptibility to Influence of Large Language Models
Lewis D. Griffin
Bennett Kleinberg
Maximilian Mozes
Kimberly T. Mai
Maria Vau
M. Caldwell
Augustine N. Mavor-Parker
45
14
0
10 Mar 2023
FaceChat: An Emotion-Aware Face-to-face Dialogue Framework
Deema Alnuhait
Qingyang Wu
Zhou Yu
14
7
0
08 Mar 2023
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Zi-Hua Zhang
Long Zhou
Chengyi Wang
Sanyuan Chen
Yu Wu
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
VLM
28
170
0
07 Mar 2023
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations
N. Shah
Saiteja Kosgi
Vishal Tambrahalli
Neha Sahipjohn
Anil Nelakanti
Vineet Gandhi
12
8
0
01 Mar 2023
Language Is Not All You Need: Aligning Perception with Language Models
Shaohan Huang
Li Dong
Wenhui Wang
Y. Hao
Saksham Singhal
...
Johan Bjorck
Vishrav Chaudhary
Subhojit Som
Xia Song
Furu Wei
VLM
LRM
MLLM
19
532
0
27 Feb 2023
Previous
1
2
3
...
10
8
9
Next