Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.02111
Cited By
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
5 January 2023
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
Shujie Liu
Zhuo Chen
Yanqing Liu
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers"
50 / 463 papers shown
Title
FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation
Jianyi Chen
Wei Xue
Xu Tan
Zhen Ye
Qi-fei Liu
Yi-Ting Guo
37
2
0
13 May 2024
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Peng Gao
Le Zhuo
Ziyi Lin
Ruoyi Du
Xu Luo
...
Weicai Ye
He Tong
Jingwen He
Yu Qiao
Hongsheng Li
VGen
30
81
0
09 May 2024
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio
Yuankun Xie
Yi Lu
Ruibo Fu
Zhengqi Wen
Zhiyong Wang
...
Xiaopeng Wang
Yukun Liu
Haonan Cheng
Long Ye
Yi Sun
36
14
0
08 May 2024
A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Model
Jiexia Ye
Weiqi Zhang
Ke Yi
Yongzi Yu
Ziyue Li
Jia Li
Fugee Tsung
AI4TS
AI4CE
43
23
0
03 May 2024
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
Haohe Liu
Xuenan Xu
Yiitan Yuan
Mengyue Wu
Wenwu Wang
Mark D. Plumbley
16
18
0
30 Apr 2024
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
Hankun Wang
Chenpeng Du
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
29
1
0
30 Apr 2024
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Shivam Mehta
Anna Deichler
Jim O'Regan
Birger Moëll
Jonas Beskow
G. Henter
Simon Alexanderson
34
4
0
30 Apr 2024
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
Yuzhe Gu
Enmao Diao
19
4
0
30 Apr 2024
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
Wenbin Wang
Yang Song
Sanjay Jha
32
10
0
28 Apr 2024
TI-ASU: Toward Robust Automatic Speech Understanding through Text-to-speech Imputation Against Missing Speech Modality
Tiantian Feng
Xuan Shi
Rahul Gupta
Shrikanth S. Narayanan
41
0
0
27 Apr 2024
V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection
Xuanyu Zhang
You-song Xu
Runyi Li
Jiwen Yu
Weiqi Li
Zhipei Xu
Jian Andrew Zhang
VGen
36
16
0
25 Apr 2024
Fake Artificial Intelligence Generated Contents (FAIGC): A Survey of Theories, Detection Methods, and Opportunities
Xiaomin Yu
Yezhaohui Wang
Yanfang Chen
Zhen Tao
Dinghao Xi
Shichao Song
Simin Niu
Zhiyu Li
62
7
0
25 Apr 2024
HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts
Xinlei Niu
Jing Zhang
Charles Patrick Martin
18
2
0
24 Apr 2024
Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities
Siyin Wang
Chao-Han Huck Yang
Ji Wu
Chao Zhang
BDL
32
4
0
23 Apr 2024
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Zhen Ye
Zeqian Ju
Haohe Liu
Xu Tan
Jianyi Chen
...
Weizhen Bian
Shulin He
Qi-fei Liu
Yi-Ting Guo
Wei Xue
35
16
0
23 Apr 2024
MAD Speech: Measures of Acoustic Diversity of Speech
Matthieu Futeral
A. Agostinelli
Marco Tagliasacchi
Neil Zeghidour
Eugene Kharitonov
46
1
0
16 Apr 2024
Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness
Xincan Feng
A. Yoshimoto
30
2
0
10 Apr 2024
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Leying Zhang
Yao Qian
Long Zhou
Shujie Liu
Dongmei Wang
...
Yanmin Qian
Jinyu Li
Lei He
Sheng Zhao
Michael Zeng
26
1
0
10 Apr 2024
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
Philip Anastassiou
Zhenyu Tang
Kainan Peng
Dongya Jia
Jiaxin Li
Ming Tu
Yuping Wang
Yuxuan Wang
Mingbo Ma
37
4
0
10 Apr 2024
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge
Yiwei Guo
Chenrun Wang
Yifan Yang
Hankun Wang
Ziyang Ma
...
Hanzheng Li
Shuai Fan
Hui Zhang
Xie Chen
Kai Yu
28
1
0
09 Apr 2024
SpeechAlign: Aligning Speech Generation to Human Preferences
Dong Zhang
Zhaowei Li
Shimin Li
Xin Zhang
Pengyu Wang
Yaqian Zhou
Xipeng Qiu
ALM
AuLLM
32
15
0
08 Apr 2024
Gull: A Generative Multifunctional Audio Codec
Yi Luo
Jianwei Yu
Hangting Chen
Rongzhi Gu
Chao Weng
AuLLM
24
3
0
07 Apr 2024
Cross-Domain Audio Deepfake Detection: Dataset and Analysis
Yuang Li
Min Zhang
Mengxin Ren
Miaomiao Ma
Daimeng Wei
Hao Yang
30
3
0
07 Apr 2024
A Map of Exploring Human Interaction patterns with LLM: Insights into Collaboration and Creativity
Jiayang Li
Jiale Li
37
7
0
06 Apr 2024
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Detai Xin
Xu Tan
Kai Shen
Zeqian Ju
Dongchao Yang
...
Shinnosuke Takamichi
Hiroshi Saruwatari
Shujie Liu
Jinyu Li
Sheng Zhao
29
23
0
04 Apr 2024
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
Jaehyeon Kim
Keon Lee
Seungjun Chung
Jaewoong Cho
65
39
0
03 Apr 2024
PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders
Yu Pan
Lei Ma
Jianjun Zhao
29
4
0
03 Apr 2024
The VoicePrivacy 2024 Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Pierre Champion
Sarina Meyer
Xin Wang
Emmanuel Vincent
Michele Panariello
Nicholas W. D. Evans
Junichi Yamagishi
Massimiliano Todisco
28
20
0
03 Apr 2024
Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems
Frank Palma Gomez
Ramon Sanabria
Yun-hsuan Sung
Daniel Matthew Cer
Siddharth Dalmia
Gustavo Hernández Ábrego
VLM
33
3
0
02 Apr 2024
BAMM: Bidirectional Autoregressive Motion Model
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Pu Wang
Minwoo Lee
Srijan Das
C. L. P. Chen
VGen
27
20
0
28 Mar 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Puyuan Peng
Po-Yao (Bernie) Huang
Daniel Li
Abdelrahman Mohamed
David F. Harwath
57
55
0
25 Mar 2024
The NeurIPS 2023 Machine Learning for Audio Workshop: Affective Audio Benchmarks and Novel Data
Alice Baird
Rachel Manzelli
Panagiotis Tzirakis
Chris Gagne
Haoqi Li
Sadie Allen
Sander Dieleman
Brian Kulis
Shrikanth S. Narayanan
Alan S. Cowen
21
0
0
21 Mar 2024
UTDUSS: UTokyo-SaruLab System for Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge
Wataru Nakata
Kazuki Yamauchi
Dong Yang
Hiroaki Hyodo
Yuki Saito
22
0
0
20 Mar 2024
MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation
Yifan Peng
Ilia Kulikov
Yilin Yang
Sravya Popuri
Hui Lu
Changhan Wang
Hongyu Gong
28
4
0
19 Mar 2024
An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis
Yifan Peng
Ilia Kulikov
Yilin Yang
Sravya Popuri
Hui Lu
Changhan Wang
Hongyu Gong
22
1
0
19 Mar 2024
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
Yongqi Wang
Ruofan Hu
Rongjie Huang
Zhiqing Hong
Ruiqi Li
Wenrui Liu
Fuming You
Tao Jin
Zhou Zhao
38
9
0
18 Mar 2024
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity
Zhuo Zhi
Ziquan Liu
M. Elbadawi
Adam Daneshmend
Mine Orlu
Abdul Basit
Andreas Demosthenous
Miguel R. D. Rodrigues
24
2
0
14 Mar 2024
EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech
Ziqi Liang
Haoxiang Shi
Jiawei Wang
Keda Lu
19
0
0
13 Mar 2024
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
Peng Liu
Dongyang Dai
Zhiyong Wu
16
2
0
08 Mar 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju
Yuancheng Wang
Kai Shen
Xu Tan
Detai Xin
...
Shikun Zhang
Jiang Bian
Lei He
Jinyu Li
Sheng Zhao
DiffM
28
139
0
05 Mar 2024
Controllable Prompt Tuning For Balancing Group Distributional Robustness
Hoang Phan
Andrew Gordon Wilson
Qi Lei
36
5
0
05 Mar 2024
VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis
Wei-wei Lin
Chenhang He
Man-Wai Mak
Jiachen Lian
Kong Aik Lee
DiffM
25
0
0
01 Mar 2024
Beyond Language Models: Byte Models are Digital World Simulators
Shangda Wu
Xu Tan
Zili Wang
Rui Wang
Xiaobing Li
Maosong Sun
30
12
0
29 Feb 2024
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data
Takaaki Saeki
Gary Wang
Nobuyuki Morioka
Isaac Elias
Kyle Kastner
...
Andrew Rosenberg
Bhuvana Ramabhadran
Heiga Zen
Francoise Beaufays
Hadar Shemtov
21
13
0
29 Feb 2024
AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
Govind Mittal
Arthur Jakobsson
Kelly O. Marshall
Chinmay Hegde
Nasir D. Memon
29
0
0
28 Feb 2024
High-Fidelity Neural Phonetic Posteriorgrams
Cameron Churchwell
Max Morrison
Bryan Pardo
24
4
0
27 Feb 2024
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Minsu Kim
Jee-weon Jung
Hyeongseop Rha
Soumi Maiti
Siddhant Arora
Xuankai Chang
Shinji Watanabe
Y. Ro
20
6
0
25 Feb 2024
Compression Robust Synthetic Speech Detection Using Patched Spectrogram Transformer
Amit Kumar Singh Yadav
Ziyue Xiang
Kratika Bhagtani
Paolo Bestagini
Stefano Tubaro
Edward J. Delp
ViT
33
2
0
22 Feb 2024
Towards audio language modeling -- an overview
Haibin Wu
Xuanjun Chen
Yi-Cheng Lin
Kai-Wei Chang
Ho-Lam Chung
Alexander H. Liu
Hung-yi Lee
AuLLM
30
28
0
20 Feb 2024
StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing
Gaoxiang Cong
Yuankai Qi
Liang-Sheng Li
Amin Beheshti
Zhedong Zhang
A. Hengel
Ming-Hsuan Yang
Chenggang Yan
Qingming Huang
35
12
0
20 Feb 2024
Previous
1
2
3
...
10
5
6
7
8
9
Next