ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.13608
  4. Cited By
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural
  Language Description

SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description

ACM Multimedia (MM), 2024
24 August 2024
Zeyu Jin
Jia Jia
Qixin Wang
Kehan Li
Shuoyi Zhou
Songtao Zhou
Xiaoyu Qin
Zhiyong Wu
ArXiv (abs)PDFHTMLGithub (184★)

Papers citing "SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description"

15 / 15 papers shown
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Yunxin Li
Xinyu Chen
Shenyuan Jiang
Haoyuan Shi
Zhenyu Liu
...
Zhenran Xu
Yicheng Ma
Meishan Zhang
Baotian Hu
Min Zhang
MLLMMoEOSLMVLM
749
13
0
16 Nov 2025
UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Wenming Tu
Guanrou Yang
Ruiqi Yan
Wenxi Chen
Ziyang Ma
Yipeng Kang
Kai Yu
Xie Chen
Zilong Zheng
181
1
0
26 Oct 2025
HiStyle: Hierarchical Style Embedding Predictor for Text-Prompt-Guided Controllable Speech Synthesis
HiStyle: Hierarchical Style Embedding Predictor for Text-Prompt-Guided Controllable Speech Synthesis
Ziyu Zhang
Hanzhao Li
Jingbin Hu
W. Li
Lei Xie
153
1
0
30 Sep 2025
MoE-TTS: Enhancing Out-of-Domain Text Understanding for Description-based TTS via Mixture-of-Experts
MoE-TTS: Enhancing Out-of-Domain Text Understanding for Description-based TTS via Mixture-of-Experts
Heyang Xue
Xuchen Song
Yu Tang
J. Chen
Yanru Chen
Yang Li
Yahui Zhou
MoE
173
2
0
15 Aug 2025
$\text{M}^3\text{PDB}$: A Multimodal, Multi-Label, Multilingual Prompt Database for Speech Generation
M3PDB\text{M}^3\text{PDB}M3PDB: A Multimodal, Multi-Label, Multilingual Prompt Database for Speech Generation
B. Zhu
Cheng Gong
Muyang Wu
Ruihao Jing
Fan Liu
Xiaolei Zhang
Chi Zhang
Xuelong Li
207
0
0
13 Aug 2025
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems
Kexin Huang
Qian Tu
Liwei Fan
Chenchen Yang
Dong Zhang
Shimin Li
Zhaoye Fei
Qinyuan Cheng
Xipeng Qiu
330
11
0
19 Jun 2025
RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval
RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval
Haoqin Sun
Jingguang Tian
Jiaming Zhou
Hui Wang
Jiabei He
...
Xiangyu Kong
Desheng Hu
Xinkang Xu
Xinhui Hu
Yong Qin
275
5
0
26 May 2025
Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Immersive Audiobook Generation
Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Immersive Audiobook Generation
Yan Rong
Shan Yang
Guangzhi Lei
Li Liu
Li Liu
416
2
0
15 Apr 2025
QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and DescriptionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Siyin Wang
Wenyi Yu
Xianzhao Chen
Xiaohai Tian
Jing Zhang
Lu Lu
Yu Tsao
Junichi Yamagishi
Longji Xu
Chao Zhang
AuLLM
572
16
0
26 Mar 2025
Scaling Rich Style-Prompted Text-to-Speech Datasets
Scaling Rich Style-Prompted Text-to-Speech Datasets
Anuj Diwan
Zhisheng Zheng
David Harwath
Eunsol Choi
CLIPVLM
562
18
0
06 Mar 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
Xiang Wang
Mingqi Jiang
Tianhao Shen
Ziyu Zhang
Shixuan Liu
...
Zhifei Li
Xie Chen
Lei Xie
Xu Tan
Wei Xue
362
135
0
03 Mar 2025
PodAgent: A Comprehensive Framework for Podcast Generation
PodAgent: A Comprehensive Framework for Podcast GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yujia Xiao
Lei He
Haohan Guo
Fenglong Xie
Tan Lee
1.0K
3
0
01 Mar 2025
Audio-Language Models for Audio-Centric Tasks: A Systematic Survey
Audio-Language Models for Audio-Centric Tasks: A Systematic Survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
LM&MAAuLLM
445
15
0
25 Jan 2025
VoxInstruct: Expressive Human Instruction-to-Speech Generation with
  Unified Multilingual Codec Language Modelling
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language ModellingACM Multimedia (MM), 2024
Yixuan Zhou
Xiaoyu Qin
Zeyu Jin
Shuoyi Zhou
Shun Lei
Songtao Zhou
Zhiyong Wu
Jia Jia
AuLLM
398
30
0
28 Aug 2024
Baichuan 2: Open Large-scale Language Models
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Guosheng Dong
Zhiying Wu
ELMLRM
1.0K
966
0
19 Sep 2023
1
Page 1 of 1