ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08484
  4. Cited By
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based
  Variable-Length Embedding
v1v2 (latest)

Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding

18 May 2020
Seungwoo Choi
Seungju Han
Dongyoung Kim
S. Ha
ArXiv (abs)PDFHTML

Papers citing "Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding"

38 / 38 papers shown
TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis
TCSinger 2: Customizable Multilingual Zero-shot Singing Voice SynthesisAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yu Zhang
Wenxiang Guo
Changhao Pan
Dongyu Yao
Zhiyuan Zhu
Ziyue Jiang
Yuhan Wang
Tao Jin
Zhou Zhao
VLM
666
11
0
20 May 2025
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
450
6
0
01 May 2025
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
Yanzhe Zhang
Wenxiang Guo
Changhao Pan
Zehan Zhu
Tao Jin
Zhou Zhao
VGen
750
9
0
29 Apr 2025
Towards Zero-Shot Text-To-Speech for Arabic Dialects
Towards Zero-Shot Text-To-Speech for Arabic Dialects
Khai Duy Doan
Abdul Waheed
Muhammad Abdul-Mageed
429
5
0
24 Jun 2024
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech ModelInterspeech (Interspeech), 2024
Edresson Casanova
Kelly Davis
Eren Golge
Görkem Göknar
Iulian Gulea
...
Aya Aljafari
Joshua Meyer
Reuben Morais
Samuel Olayemi
Julian Weber
VLM
394
254
0
07 Jun 2024
Unmasking Illusions: Understanding Human Perception of Audiovisual
  Deepfakes
Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes
Ammarah Hashmi
Sahibzada Adil Shahzad
Chia-Wen Lin
Yu Tsao
Hsin-Min Wang
251
12
0
07 May 2024
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis
StyleSinger: Style Transfer for Out-of-Domain Singing Voice SynthesisAAAI Conference on Artificial Intelligence (AAAI), 2023
Yu Zhang
Rongjie Huang
Ruiqi Li
Jinzheng He
Yan Xia
Feiyang Chen
Xinyu Duan
Baoxing Huai
Zhou Zhao
VLM
564
43
0
17 Dec 2023
Detecting Voice Cloning Attacks via Timbre Watermarking
Detecting Voice Cloning Attacks via Timbre Watermarking
Chang-rui Liu
Jie Zhang
Tianwei Zhang
Xi Yang
Weiming Zhang
Neng H. Yu
319
69
0
06 Dec 2023
AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset
AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake DatasetACM Multimedia (ACM MM), 2023
Zhixi Cai
Shreya Ghosh
Aman Pankaj Adatia
Munawar Hayat
Abhinav Dhall
Kalin Stefanov
268
95
0
26 Nov 2023
Controllable Generation of Artificial Speaker Embeddings through
  Discovery of Principal Directions
Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal DirectionsInterspeech (Interspeech), 2023
Florian Lux
Pascal Tilli
Sarina Meyer
Ngoc Thang Vu
211
3
0
26 Oct 2023
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with
  Multi-Scale Acoustic Prompts
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic PromptsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shunwei Lei
Yixuan Zhou
Liyang Chen
Dan Luo
Zhiyong Wu
...
Shiyin Kang
Tao Jiang
Yahui Zhou
Yuxing Han
Helen M. Meng
VLM
213
4
0
21 Sep 2023
Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any
  Voice Conversion using Only Speech Data
Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Hyungseob Lim
Kyungguen Byun
Sunkuk Moon
Erik Visser
DiffM
326
2
0
06 Sep 2023
Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with
  Disentangled Representations
Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled RepresentationsInterspeech (Interspeech), 2023
Wen Wang
Yang Song
S. Jha
227
15
0
24 Aug 2023
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech SynthesisInternational Conference on Learning Representations (ICLR), 2023
Ziyue Jiang
Jinglin Liu
Yi Ren
Jinzheng He
Zhe Ye
...
Pengfei Wei
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
367
74
0
14 Jul 2023
Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in
  End-to-End Zero-Shot Speech Synthesis
Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech SynthesisInterspeech (Interspeech), 2023
Seong-Hyun Park
Bohyung Kim
Tae-Hyun Oh
233
1
0
26 May 2023
Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive
  Structured Pruning
Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured PruningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Sung-Feng Huang
Chia-Ping Chen
Zhi-Sheng Chen
Yu-Pao Tsai
Hung-yi Lee
265
7
0
21 Mar 2023
Warning: Humans Cannot Reliably Detect Speech Deepfakes
Warning: Humans Cannot Reliably Detect Speech DeepfakesPLoS ONE (PLoS ONE), 2023
Kimberly T. Mai
Sergi D. Bray
Toby O. Davies
Lewis D. Griffin
362
74
0
19 Jan 2023
Towards zero-shot Text-based voice editing using acoustic context
  conditioning, utterance embeddings, and reference encoders
Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders
Jason Fong
Yun Wang
Prabhav Agrawal
Vimal Manohar
Jilong Wu
Thilo Kohler
Qing He
202
0
0
28 Oct 2022
Semi-Supervised Learning Based on Reference Model for Low-resource TTS
Semi-Supervised Learning Based on Reference Model for Low-resource TTSInternational Conference on Mobile Ad-hoc and Sensor Networks (MSN), 2022
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
AI4TS
302
6
0
25 Oct 2022
Low-Resource Multilingual and Zero-Shot Multispeaker TTS
Low-Resource Multilingual and Zero-Shot Multispeaker TTS
Florian Lux
Julia Koch
Ngoc Thang Vu
238
27
0
21 Oct 2022
Mid-attribute speaker generation using optimal-transport-based
  interpolation of Gaussian mixture models
Mid-attribute speaker generation using optimal-transport-based interpolation of Gaussian mixture modelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Aya Watanabe
Shinnosuke Takamichi
Yuki Saito
Detai Xin
Hiroshi Saruwatari
178
4
0
18 Oct 2022
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and
  Any-to-any Voice Conversion
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice ConversionInterspeech (Interspeech), 2022
Yinjiao Lei
Shan Yang
Jian Cong
Linfu Xie
Jane Polak Scowcroft
DiffM
224
14
0
05 Jul 2022
Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech
Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-SpeechSpoken Language Technology Workshop (SLT), 2022
Florian Lux
Julia Koch
Ngoc Thang Vu
259
25
0
24 Jun 2022
Fine-grained Noise Control for Multispeaker Speech Synthesis
Fine-grained Noise Control for Multispeaker Speech SynthesisInterspeech (Interspeech), 2022
Karolos Nikitaras
G. Vamvoukakis
Nikolaos Ellinas
Konstantinos Klapsas
K. Markopoulos
S. Raptis
June Sig Sung
Gunu Jho
Aimilios Chalamandaris
Pirros Tsiakoulis
232
5
0
11 Apr 2022
Self-supervised learning for robust voice cloning
Self-supervised learning for robust voice cloningInterspeech (Interspeech), 2022
Konstantinos Klapsas
Nikolaos Ellinas
Karolos Nikitaras
G. Vamvoukakis
Panos Kakoulidis
...
S. Raptis
June Sig Sung
Gunu Jho
Aimilios Chalamandaris
Pirros Tsiakoulis
SSL
270
7
0
07 Apr 2022
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker
  Adaptation in Text-to-Speech Synthesis
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech SynthesisInterspeech (Interspeech), 2022
Yixuan Zhou
Changhe Song
Xiang Li
Lu Zhang
Zhiyong Wu
Yanyao Bian
Jane Polak Scowcroft
Helen Meng
389
28
0
03 Apr 2022
ASR data augmentation in low-resource settings using cross-lingual
  multi-speaker TTS and cross-lingual voice conversion
ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversionInterspeech (Interspeech), 2022
Edresson Casanova
C. Shulby
Alexander Korolev
Arnaldo Cândido Júnior
A. S. Soares
S. Aluísio
M. Ponti
387
19
0
29 Mar 2022
Attacker Attribution of Audio Deepfakes
Attacker Attribution of Audio DeepfakesInterspeech (Interspeech), 2022
Nicolas Müller
Franziska Dieckmann
Jennifer Williams
152
24
0
28 Mar 2022
Speaker Adaption with Intuitive Prosodic Features for Statistical
  Parametric Speech Synthesis
Speaker Adaption with Intuitive Prosodic Features for Statistical Parametric Speech SynthesisInternational Conference on Digital Signal Processing (DSP), 2022
Pengyu Cheng
Zhenhua Ling
229
4
0
02 Mar 2022
Voice Filter: Few-shot text-to-speech speaker adaptation using voice
  conversion as a post-processing module
Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing moduleIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Adam Gabry's
Goeric Huybrechts
M. Ribeiro
C. Chien
Julian Roth
Giulia Comini
Roberto Barra-Chicote
Bartek Perz
Jaime Lorenzo-Trueba
241
29
0
16 Feb 2022
MR-SVS: Singing Voice Synthesis with Multi-Reference Encoder
MR-SVS: Singing Voice Synthesis with Multi-Reference Encoder
Shoutong Wang
Jinglin Liu
Yi Ren
Zhen Wang
Changliang Xu
Zhou Zhao
126
7
0
11 Jan 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice
  Conversion for everyone
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyoneInternational Conference on Machine Learning (ICML), 2021
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
838
585
0
04 Dec 2021
Speaker Generation
Speaker Generation
Daisy Stanton
Matt Shannon
Soroosh Mariooryad
RJ Skerry-Ryan
Eric Battenberg
Tom Bagby
David Kao
293
39
0
07 Nov 2021
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Sung-Feng Huang
Chyi-Jiunn Lin
Da-Rong Liu
Yi-Chen Chen
Hung-yi Lee
661
75
0
07 Nov 2021
Msdtron: a high-capability multi-speaker speech synthesis system for
  diverse data using characteristic information
Msdtron: a high-capability multi-speaker speech synthesis system for diverse data using characteristic information
Qinghua Wu
Quanbo Shen
Jian Luan
YuJun Wang
274
4
0
07 Jul 2021
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
466
446
0
29 Jun 2021
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech ModelInterspeech (Interspeech), 2021
Edresson Casanova
C. Shulby
Eren Golge
Nicolas Müller
F. S. Oliveira
Arnaldo Cândido Júnior
A. S. Soares
S. Aluísio
M. Ponti
296
113
0
02 Apr 2021
A Survey on Machine Learning from Few Samples
A Survey on Machine Learning from Few SamplesPattern Recognition (Pattern Recognit.), 2020
Jiang Lu
Pinghua Gong
Jieping Ye
Jianwei Zhang
Changshu Zhang
377
81
0
06 Sep 2020
1
Page 1 of 1