ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.07691
  4. Cited By
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion
  and Adversarial Training with Large Speech Language Models

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

13 June 2023
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
    VLM
    DiffM
ArXivPDFHTML

Papers citing "StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models"

19 / 69 papers shown
Title
MAD Speech: Measures of Acoustic Diversity of Speech
MAD Speech: Measures of Acoustic Diversity of Speech
Matthieu Futeral
A. Agostinelli
Marco Tagliasacchi
Neil Zeghidour
Eugene Kharitonov
46
1
0
16 Apr 2024
HyperTTS: Parameter Efficient Adaptation in Text to Speech using
  Hypernetworks
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
Yingting Li
Rishabh Bhardwaj
Ambuj Mehrish
Bo Cheng
Soujanya Poria
30
2
0
06 Apr 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and
  Diffusion Models
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju
Yuancheng Wang
Kai Shen
Xu Tan
Detai Xin
...
Shikun Zhang
Jiang Bian
Lei He
Jinyu Li
Sheng Zhao
DiffM
33
143
0
05 Mar 2024
AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
Govind Mittal
Arthur Jakobsson
Kelly O. Marshall
Chinmay Hegde
Nasir D. Memon
29
0
0
28 Feb 2024
Large Multimodal Agents: A Survey
Large Multimodal Agents: A Survey
Junlin Xie
Zhihong Chen
Ruifei Zhang
Xiang Wan
Guanbin Li
LM&Ro
LLMAG
37
38
0
23 Feb 2024
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot
  Text-to-Speech
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
Shengpeng Ji
Ziyue Jiang
Hanting Wang
Jia-li Zuo
Zhou Zhao
32
9
0
14 Feb 2024
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis
  Conditioned on Self-supervised Discrete Speech Representations
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
19
20
0
22 Dec 2023
HierSpeech++: Bridging the Gap between Semantic and Acoustic
  Representation of Speech by Hierarchical Variational Inference for Zero-shot
  Speech Synthesis
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Sang-Hoon Lee
Haram Choi
Seung-Bin Kim
Seong-Whan Lee
BDL
25
31
0
21 Nov 2023
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise
  Filter and Inverse Short Time Fourier Transform
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
30
4
0
18 Sep 2023
PromptTTS 2: Describing and Generating Voices with Text Prompt
PromptTTS 2: Describing and Generating Voices with Text Prompt
Yichong Leng
Zhifang Guo
Kai Shen
Xu Tan
Zeqian Ju
...
Lei He
Xiang-Yang Li
Sheng Zhao
Tao Qin
Jiang Bian
VLM
DiffM
37
40
0
05 Sep 2023
SLMGAN: Exploiting Speech Language Model Representations for
  Unsupervised Zero-Shot Voice Conversion in GANs
SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs
Yinghao Aaron Li
Cong Han
N. Mesgarani
20
5
0
18 Jul 2023
M2-CTTS: End-to-End Multi-scale Multi-modal Conversational
  Text-to-Speech Synthesis
M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis
Jinlong Xue
Yayue Deng
Fengping Wang
Ya Li
Yingming Gao
J. Tao
Jianqing Sun
Jiaen Liang
21
8
0
03 May 2023
Zero-shot text-to-speech synthesis conditioned using self-supervised
  speech representation model
Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model
Kenichi Fujita
Takanori Ashihara
Hiroki Kanagawa
Takafumi Moriya
Yusuke Ijima
30
10
0
24 Apr 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative
  Language Model
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
23
7
0
06 Mar 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
43
637
0
05 Jan 2023
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on
  Fixed-Point Iteration
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration
Yuma Koizumi
Kohei Yatabe
Heiga Zen
M. Bacchiani
DiffM
42
29
0
03 Oct 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse
  Text-to-Speech Synthesis
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis
Yinghao Aaron Li
Cong Han
N. Mesgarani
33
38
0
30 May 2022
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising
  Diffusion GANs
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
Songxiang Liu
Dan Su
Dong Yu
DiffM
68
65
0
28 Jan 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice
  Conversion for everyone
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
174
377
0
04 Dec 2021
Previous
12