ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.04713
  4. Cited By
Scaling Rich Style-Prompted Text-to-Speech Datasets
v1v2 (latest)

Scaling Rich Style-Prompted Text-to-Speech Datasets

6 March 2025
Anuj Diwan
Zhisheng Zheng
David Harwath
Eunsol Choi
    CLIPVLM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (3★)

Papers citing "Scaling Rich Style-Prompted Text-to-Speech Datasets"

9 / 9 papers shown
Revisiting Audio-language Pretraining for Learning General-purpose Audio Representation
Revisiting Audio-language Pretraining for Learning General-purpose Audio Representation
Wei-Cheng Tseng
Xuanru Zhou
Mingyue Huo
Yiwen Shao
Hao Zhang
Dong Yu
CLIPAI4TSVLM
205
1
0
20 Nov 2025
Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding
Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding
Mingyue Huo
Wei-Cheng Tseng
Yiwen Shao
Hao Zhang
Dong Yu
AuLLM
535
3
0
19 Nov 2025
VoxGuard: Evaluating User and Attribute Privacy in Speech via Membership Inference Attacks
VoxGuard: Evaluating User and Attribute Privacy in Speech via Membership Inference Attacks
Efthymios Tsaprazlis
Thanathai Lertpetchpun
Tiantian Feng
Sai Praneeth Karimireddy
Zengyi Qin
271
0
0
22 Sep 2025
Vevo2: A Unified and Controllable Framework for Speech and Singing Voice Generation
Vevo2: A Unified and Controllable Framework for Speech and Singing Voice Generation
Xueyao Zhang
Junan Zhang
Yuancheng Wang
Chaoren Wang
Yuanzhe Chen
Dongya Jia
Zhuo Chen
Zhizheng Wu
DiffM
329
6
0
22 Aug 2025
MoE-TTS: Enhancing Out-of-Domain Text Understanding for Description-based TTS via Mixture-of-Experts
MoE-TTS: Enhancing Out-of-Domain Text Understanding for Description-based TTS via Mixture-of-Experts
Heyang Xue
Xuchen Song
Yu Tang
J. Chen
Yanru Chen
Yang Li
Yahui Zhou
MoE
172
2
0
15 Aug 2025
Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe
Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe
Tiantian Feng
Kevin Huang
Anfeng Xu
Xuan Shi
Thanathai Lertpetchpun
Jihwan Lee
Yoonjeong Lee
D. Byrd
Shrikanth Narayanan
201
9
0
03 Aug 2025
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems
Kexin Huang
Qian Tu
Liwei Fan
Chenchen Yang
Dong Zhang
Shimin Li
Zhaoye Fei
Qinyuan Cheng
Xipeng Qiu
329
11
0
19 Jun 2025
VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation
VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation
Puyuan Peng
Shang-Wen Li
Abdelrahman Mohamed
David Harwath
286
0
0
26 May 2025
Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits
Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits
Tiantian Feng
Jihwan Lee
Anfeng Xu
Yoonjeong Lee
Thanathai Lertpetchpun
...
Thomas Thebaud
Laureano Moro-Velazquez
D. Byrd
Najim Dehak
Zengyi Qin
337
18
0
20 May 2025
1
Page 1 of 1