ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.03926
  4. Cited By
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec
  Language Modeling

Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

7 March 2023
Zi-Hua Zhang
Long Zhou
Chengyi Wang
Sanyuan Chen
Yu Wu
Shujie Liu
Zhuo Chen
Yanqing Liu
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
    VLM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling"

50 / 67 papers shown
Title
How I Built ASR for Endangered Languages with a Spoken Dictionary
How I Built ASR for Endangered Languages with a Spoken Dictionary
Christopher Bartley
Anton Ragni
12
0
0
06 Oct 2025
Speak, Edit, Repeat: High-Fidelity Voice Editing and Zero-Shot TTS with Cross-Attentive Mamba
Speak, Edit, Repeat: High-Fidelity Voice Editing and Zero-Shot TTS with Cross-Attentive Mamba
Baher Mohammad
Magauiya Zhussip
Stamatios Lefkimmiatis
Mamba
16
0
0
06 Oct 2025
Flamed-TTS: Flow Matching Attention-Free Models for Efficient Generating and Dynamic Pacing Zero-shot Text-to-Speech
Flamed-TTS: Flow Matching Attention-Free Models for Efficient Generating and Dynamic Pacing Zero-shot Text-to-Speech
Hieu-Nghia Huynh-Nguyen
Huynh Nguyen Dang
Ngoc Son Nguyen
Van Nguyen
8
0
0
03 Oct 2025
Comprehend and Talk: Text to Speech Synthesis via Dual Language Modeling
Comprehend and Talk: Text to Speech Synthesis via Dual Language Modeling
Junjie Cao
Yichen Han
Ruonan Zhang
Xiaoyang Hao
Hongxiang Li
Shuaijiang Zhao
Yue Liu
Xiao-Ping Zhng
23
0
0
26 Sep 2025
Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis
Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis
Qingyu Liu
Y. Chen
Zhikang Niu
Chunhui Wang
Yunting Yang
Bowen Zhang
Jian Zhao
Pengcheng Zhu
K. Yu
Xie Chen
0
0
0
18 Sep 2025
Mitigating Hallucinations in LM-Based TTS Models via Distribution Alignment Using GFlowNets
Mitigating Hallucinations in LM-Based TTS Models via Distribution Alignment Using GFlowNets
Chenlin Liu
Minghui Fang
Patrick Zhang
Wei Zhou
Jie Gao
Jiqing Han
68
0
0
21 Aug 2025
SecoustiCodec: Cross-Modal Aligned Streaming Single-Codecbook Speech Codec
SecoustiCodec: Cross-Modal Aligned Streaming Single-Codecbook Speech Codec
Chunyu Qiang
Haoyu Wang
Cheng Gong
Tianrui Wang
Ruibo Fu
...
Zhengqi Wen
C. Zhang
Longbiao Wang
Jianwu Dang
J. Tao
40
0
0
04 Aug 2025
Next Tokens Denoising for Speech Synthesis
Next Tokens Denoising for Speech Synthesis
Yanqing Liu
Ruiqing Xue
C. Zhang
Yufei Liu
G. Wang
Bohan Li
Yao Qian
Lei He
Shujie Liu
Sheng Zhao
DiffM
70
1
0
30 Jul 2025
Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning
Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning
Kwesi Cobbina
Tianyi Zhou
42
1
0
30 Jul 2025
Data Augmentation for Spoken Grammatical Error Correction
Data Augmentation for Spoken Grammatical Error Correction
Penny Karanasou
Mengjie Qian
Stefano Bannò
Mark Gales
Kate Knill
66
1
0
25 Jul 2025
Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey
Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey
Jindong Li
Yali Fu
Jiahong Liu
Linxiao Cao
Wei Ji
Menglin Yang
Irwin King
Ming-Hsuan Yang
OffRL
70
0
0
21 Jul 2025
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
Hayato Futami
E. Tsunoo
Yosuke Kashiwagi
Yuki Ito
Hassan Shahmohammadi
Siddhant Arora
Shinji Watanabe
AuLLM
157
0
0
12 Jun 2025
Kinship in Speech: Leveraging Linguistic Relatedness for Zero-Shot TTS in Indian Languages
Kinship in Speech: Leveraging Linguistic Relatedness for Zero-Shot TTS in Indian Languages
Utkarsh Pathak
Chandra Sai Krishna Gunda
Anusha Prakash
Keshav Agarwal
Hema A. Murthy
134
0
0
04 Jun 2025
DS-TTS: Zero-Shot Speaker Style Adaptation from Voice Clips via Dynamic Dual-Style Feature Modulation
DS-TTS: Zero-Shot Speaker Style Adaptation from Voice Clips via Dynamic Dual-Style Feature Modulation
Ming Meng
Ziyi Yang
Jian Yang
Zhenjie Su
Yonggui Zhu
Zhaoxin Fan
DiffMVLM
108
3
0
01 Jun 2025
Voice Adaptation for Swiss German
Voice Adaptation for Swiss German
Samuel Stucki
Jan Deriu
Mark Cieliebak
92
0
0
28 May 2025
VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation
VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation
Puyuan Peng
Shang-Wen Li
Abdelrahman Mohamed
David Harwath
111
1
0
26 May 2025
Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling
Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling
Haiyang Sun
Shujie Hu
Shujie Liu
L. Meng
Hui Wang
...
Yifan Yang
Yanqing Liu
Sheng Zhao
Yan Lu
Y. Qian
146
3
0
26 May 2025
Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding
Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding
Zijian Lin
Yang Zhang
Yougen Yuan
Yuming Yan
Jinjiang Liu
Zhiyong Wu
Pengfei Hu
Qun Yu
179
0
0
21 May 2025
Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning
Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning
Junchuan Zhao
Xintong Wang
Ye Wang
66
3
0
21 May 2025
Language translation, and change of accent for speech-to-speech task using diffusion model
Language translation, and change of accent for speech-to-speech task using diffusion model
Abhishek Mishra
Ritesh Sur Chowdhury
Vartul Bahuguna
Isha Pandey
Ganesh Ramakrishnan
DiffM
100
0
0
04 May 2025
ClonEval: An Open Voice Cloning Benchmark
ClonEval: An Open Voice Cloning Benchmark
Iwona Christop
Tomasz Kuczyński
Marek Kubis
AuLLM
116
0
0
29 Apr 2025
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
Xiaohui Sun
Ruitong Xiao
Jianye Mo
Bowen Wu
Qun Yu
Baoxun Wang
193
8
0
03 Apr 2025
Personalized Generation In Large Model Era: A Survey
Personalized Generation In Large Model Era: A Survey
Yiyan Xu
Jinghao Zhang
Alireza Salemi
Xinting Hu
Wenjie Wang
Fuli Feng
Hamed Zamani
Xiangnan He
Tat-Seng Chua
3DV
298
19
0
04 Mar 2025
Everyday Speech in the Indian Subcontinent
Everyday Speech in the Indian Subcontinent
Utkarsh Pathak
160
1
0
24 Feb 2025
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
Hui Wang
Shujie Liu
Lingwei Meng
Jiajian Li
Yifan Yang
...
Yanqing Liu
Haoqin Sun
Jiaming Zhou
Yan Lu
Yong Qin
164
10
0
16 Feb 2025
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Mikyas T. Desta
Roy Fejgin
Rafael Valle
Jason Chun Lok Li
197
14
0
07 Feb 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
230
9
0
28 Jan 2025
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
Vladimir Bataev
Subhankar Ghosh
Vitaly Lavrukhin
Jason Chun Lok Li
AI4TS
152
1
0
10 Jan 2025
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Ji-Hoon Kim
Hong-Sun Yang
Yoon-Cheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
BDL
172
0
0
31 Dec 2024
The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024
The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024
Shuoyi Zhou
Yixuan Zhou
Weiqing Li
Jun Chen
Runchuan Ye
Weihao Wu
Zijian Lin
Shun Lei
Zhiyong Wu
268
1
0
02 Dec 2024
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Weijia Wu
Mingyu Liu
Zeyu Zhu
Xi Xia
Haoen Feng
Wen Wang
Kevin Qinghong Lin
Chunhua Shen
Mike Zheng Shou
DiffMVGen
296
8
0
22 Nov 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Emmanouil Benetos
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
281
197
0
09 Oct 2024
SegINR: Segment-wise Implicit Neural Representation for Sequence
  Alignment in Neural Text-to-Speech
SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech
Minchan Kim
Myeonghun Jeong
Joun Yeop Lee
Nam Soo Kim
96
0
0
07 Oct 2024
Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions
Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions
Kun Zhou
You Zhang
Shengkui Zhao
Hao Wang
Zexu Pan
...
Chongjia Ni
Yukun Ma
Trung Hieu Nguyen
J. Yip
Bin Ma
180
8
0
25 Sep 2024
Enhancing Code-switched Text-to-Speech Synthesis Capability in Large Language Models with only Monolingual Corpora
Enhancing Code-switched Text-to-Speech Synthesis Capability in Large Language Models with only Monolingual Corpora
Jing Xu
Daxin Tan
J. Wang
Xiao Chen
114
0
0
17 Sep 2024
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
Sho Inoue
Shuai Wang
Wanxing Wang
Pengcheng Zhu
Mengxiao Bi
Haizhou Li
168
3
0
14 Sep 2024
VoiceWukong: Benchmarking Deepfake Voice Detection
VoiceWukong: Benchmarking Deepfake Voice Detection
Ziwei Yan
Yanjie Zhao
Haoyu Wang
154
3
0
10 Sep 2024
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
Hao-Han Guo
Kun Liu
Fei-Yu Shen
Yi-Chen Wu
Xu Tang
Kun Xie
Kai-Tuo Xu
Kun Xie
Kai-Tuo Xu
188
62
0
05 Sep 2024
VoxInstruct: Expressive Human Instruction-to-Speech Generation with
  Unified Multilingual Codec Language Modelling
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
Yixuan Zhou
Xiaoyu Qin
Zeyu Jin
Shuoyi Zhou
Shun Lei
Songtao Zhou
Zhiyong Wu
Jia Jia
AuLLM
162
15
0
28 Aug 2024
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
Chunyu Qiang
Wang Geng
Yi Zhao
Ruibo Fu
Tao Wang
...
Chen Zhang
Hao Che
L. Wang
Jianwu Dang
J. Tao
AI4TS
151
4
0
11 Aug 2024
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like
  Spontaneous Representation
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Xinhan Di
Jiahao Lu
Yunming Liang
Junjie Zheng
Yihua Wang
Chaofan Ding
ALM
152
2
0
01 Aug 2024
Live2Diff: Live Stream Translation via Uni-directional Attention in
  Video Diffusion Models
Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models
Zhening Xing
Gereon Fox
Yanhong Zeng
Xingang Pan
Mohamed A. Elgharib
Christian Theobalt
Kai Chen
VGen
131
4
0
11 Jul 2024
Autoregressive Speech Synthesis without Vector Quantization
Autoregressive Speech Synthesis without Vector Quantization
Lingwei Meng
Long Zhou
Shujie Liu
Sanyuan Chen
Bing Han
...
Jinyu Li
Sheng Zhao
Xixin Wu
Helen M. Meng
Furu Wei
206
67
0
11 Jul 2024
Can We Achieve High-quality Direct Speech-to-Speech Translation without
  Parallel Speech Data?
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?
Qingkai Fang
Shaolei Zhang
Zhengrui Ma
Min Zhang
Yang Feng
VLM
116
5
0
11 Jun 2024
The Codecfake Dataset and Countermeasures for the Universally Detection
  of Deepfake Audio
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio
Yuankun Xie
Yi Lu
Ruibo Fu
Zhengqi Wen
Zhiyong Wang
...
Xiaopeng Wang
Yukun Liu
Haonan Cheng
Long Ye
Yi Sun
133
36
0
08 May 2024
FlashSpeech: Efficient Zero-Shot Speech Synthesis
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Zhen Ye
Zeqian Ju
Haohe Liu
Xu Tan
Jianyi Chen
...
Weizhen Bian
Shulin He
Qi-fei Liu
Yi-Ting Guo
Wei Xue
145
28
0
23 Apr 2024
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like
  Multi-talker Conversations
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Leying Zhang
Yao Qian
Long Zhou
Shujie Liu
Dongmei Wang
...
Yanmin Qian
Jinyu Li
Lei He
Sheng Zhao
Michael Zeng
120
10
0
10 Apr 2024
MSLM-S2ST: A Multitask Speech Language Model for Textless
  Speech-to-Speech Translation with Speaker Style Preservation
MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation
Yifan Peng
Ilia Kulikov
Yilin Yang
Sravya Popuri
Hui Lu
Changhan Wang
Hongyu Gong
84
7
0
19 Mar 2024
Towards audio language modeling -- an overview
Towards audio language modeling -- an overview
Haibin Wu
Xuanjun Chen
Yi-Cheng Lin
Kai-Wei Chang
Ho-Lam Chung
Alexander H. Liu
Hung-yi Lee
AuLLM
163
51
0
20 Feb 2024
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model
  on 100K hours of data
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Mateusz Lajszczak
Guillermo Cámbara
Yang Li
Fatih Beyhan
Arent van Korlaar
...
Bartosz Putrycz
Soledad López Gambino
Kayeon Yoo
Elena Sokolova
Thomas Drugman
LM&MA
175
101
0
12 Feb 2024
12
Next