ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.02152
  4. Cited By
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022

UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022

5 April 2022
Takaaki Saeki
Detai Xin
Wataru Nakata
Tomoki Koriyama
Shinnosuke Takamichi
Hiroshi Saruwatari
ArXivPDFHTML

Papers citing "UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022"

50 / 116 papers shown
Title
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
C. Han
Seokgi Lee
Gyuhyeon Nam
Gyeongsu Chae
DiffM
43
0
0
14 Sep 2024
The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from
  Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic
  Speech
The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
Kaito Baba
Wataru Nakata
Yuki Saito
Hiroshi Saruwatari
VLM
20
7
0
14 Sep 2024
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
Jiawei Du
I-Ming Lin
I-Hsiang Chiu
Xuanjun Chen
Haibin Wu
Wenze Ren
Yu Tsao
Hung-yi Lee
Jyh-Shing Roger Jang
DiffM
30
2
0
13 Sep 2024
Text-To-Speech Synthesis In The Wild
Text-To-Speech Synthesis In The Wild
Jee-weon Jung
Wangyou Zhang
Soumi Maiti
Yihan Wu
Xin Wang
...
Hye-jin Shim
Nicholas W. D. Evans
Joon Son Chung
Shinnosuke Takamichi
Shinji Watanabe
24
1
0
13 Sep 2024
Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic
  Framework and its Applicability in Automatic Pronunciation Assessment
Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment
Tien-Hong Lo
Meng-Ting Tsai
Berlin Chen
20
0
0
11 Sep 2024
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Qingkai Fang
Shoutao Guo
Yan Zhou
Zhengrui Ma
Shaolei Zhang
Yang Feng
AuLLM
25
29
0
10 Sep 2024
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with
  Adversarial Conditional Diffusion Distillation
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Yuto Kondo
DiffM
26
0
0
03 Sep 2024
Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement
  Face-based Voice Conversion
Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion
Yan Rong
Li Liu
19
3
0
01 Sep 2024
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio
  Language Model
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Zhen Ye
Peiwen Sun
Jiahe Lei
Hongzhan Lin
Xu Tan
...
Jianyi Chen
Jiahao Pan
Qifeng Liu
Yike Guo
Wei Xue
AuLLM
17
11
0
30 Aug 2024
SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
Ismail Rasim Ulgen
Shreeram Suresh Chandra
Junchen Lu
Berrak Sisman
52
0
0
30 Aug 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
49
32
0
29 Aug 2024
Self-supervised Speech Representations Still Struggle with African
  American Vernacular English
Self-supervised Speech Representations Still Struggle with African American Vernacular English
Kalvin Chang
Yi-Hui Chou
Jiatong Shi
Hsuan-Ming Chen
Nicole Holliday
Odette Scharenborg
David R. Mortensen
28
2
0
26 Aug 2024
Synchronous Multi-modal Semantic Communication System with Packet-level
  Coding
Synchronous Multi-modal Semantic Communication System with Packet-level Coding
Yun Tian
Jingkai Ying
Zhijin Qin
Ye Jin
Xiaoming Tao
20
3
0
08 Aug 2024
TTSDS -- Text-to-Speech Distribution Score
TTSDS -- Text-to-Speech Distribution Score
Christoph Minixhofer
Ondˇrej Klejch
Peter Bell
24
0
0
17 Jul 2024
Analyzing Speech Unit Selection for Textless Speech-to-Speech
  Translation
Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation
J. Duret
Yannick Esteve
Titouan Parcollet
36
0
0
08 Jul 2024
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for
  Text-to-Speech Speaker Adaptation
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
Ruibo Fu
Xin Qi
Zhengqi Wen
Jianhua Tao
Tao Wang
...
Xiaopeng Wang
Shuchen Shi
Yukun Liu
Xuefei Liu
Shuai Zhang
42
0
0
07 Jul 2024
GLOBE: A High-quality English Corpus with Global Accents for Zero-shot
  Speaker Adaptive Text-to-Speech
GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech
Wenbin Wang
Yang Song
Sanjay Jha
22
5
0
21 Jun 2024
DASB -- Discrete Audio and Speech Benchmark
DASB -- Discrete Audio and Speech Benchmark
Pooneh Mousavi
Luca Della Libera
J. Duret
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
27
12
0
20 Jun 2024
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Cheol Jun Cho
Peter Wu
Tejas S. Prabhune
Dhruv Agarwal
Gopala K. Anumanchipalli
24
1
0
18 Jun 2024
SingMOS: An extensive Open-Source Singing Voice Dataset for MOS
  Prediction
SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction
Yuxun Tang
Jiatong Shi
Yuning Wu
Qin Jin
21
8
0
16 Jun 2024
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
Pooneh Mousavi
J. Duret
Salah Zaiem
Luca Della Libera
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
29
9
0
15 Jun 2024
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech
  Representation from Self-supervised Learning Model
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
Jiatong Shi
Xutai Ma
Hirofumi Inaguma
Anna Y. Sun
Shinji Watanabe
47
7
0
14 Jun 2024
An Initial Investigation of Language Adaptation for TTS Systems under
  Low-resource Scenarios
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
Cheng Gong
Erica Cooper
Xin Wang
Chunyu Qiang
Mengzhe Geng
...
Jianwu Dang
Marc Tessier
Aidan Pine
Korin Richmond
Junichi Yamagishi
25
2
0
13 Jun 2024
SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with
  Representations from Speech Foundation Models
SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models
Chun Yin
Tai-Shih Chi
Yu Tsao
Hsin-Min Wang
27
0
0
12 Jun 2024
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical
  Emotion Vector for Controllable Emotional Text-to-Speech
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Sang-Hoon Lee
Seong-Whan Lee
24
6
0
12 Jun 2024
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Xuankai Chang
Jiatong Shi
Jinchuan Tian
Yuning Wu
Yuxun Tang
Yihan Wu
Shinji Watanabe
Yossi Adi
Xie Chen
Qin Jin
40
15
0
11 Jun 2024
Noise-Robust Voice Conversion by Conditional Denoising Training Using
  Latent Variables of Recording Quality and Environment
Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment
Takuto Igarashi
Yuki Saito
Kentaro Seki
Shinnosuke Takamichi
Ryuichi Yamamoto
Kentaro Tachibana
Hiroshi Saruwatari
19
1
0
11 Jun 2024
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
Edresson Casanova
Kelly Davis
Eren Golge
Görkem Göknar
Iulian Gulea
...
Aya Aljafari
Joshua Meyer
Reuben Morais
Samuel Olayemi
Julian Weber
VLM
24
65
0
07 Jun 2024
BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction
  and Waveform Generation
BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation
Hui-Peng Du
Ye-Xin Lu
Yang Ai
Zhen-Hua Ling
27
3
0
04 Jun 2024
Multi-speaker Text-to-speech Training with Speaker Anonymized Data
Multi-speaker Text-to-speech Training with Speaker Anonymized Data
Wen-Chin Huang
Yi-Chiao Wu
T. Toda
27
1
0
20 May 2024
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
Wenbin Wang
Yang Song
Sanjay Jha
29
10
0
28 Apr 2024
FlashSpeech: Efficient Zero-Shot Speech Synthesis
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Zhen Ye
Zeqian Ju
Haohe Liu
Xu Tan
Jianyi Chen
...
Weizhen Bian
Shulin He
Qi-fei Liu
Yi-Ting Guo
Wei Xue
30
16
0
23 Apr 2024
The X-LANCE Technical Report for Interspeech 2024 Speech Processing
  Using Discrete Speech Unit Challenge
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge
Yiwei Guo
Chenrun Wang
Yifan Yang
Hankun Wang
Ziyang Ma
...
Hanzheng Li
Shuai Fan
Hui Zhang
Xie Chen
Kai Yu
28
1
0
09 Apr 2024
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting
  for Text-to-Speech Synthesis
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Detai Xin
Xu Tan
Kai Shen
Zeqian Ju
Dongchao Yang
...
Shinnosuke Takamichi
Hiroshi Saruwatari
Shujie Liu
Jinyu Li
Sheng Zhao
21
23
0
04 Apr 2024
Training Generative Adversarial Network-Based Vocoder with Limited Data
  Using Augmentation-Conditional Discriminator
Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
19
0
0
25 Mar 2024
UTDUSS: UTokyo-SaruLab System for Interspeech2024 Speech Processing
  Using Discrete Speech Unit Challenge
UTDUSS: UTokyo-SaruLab System for Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge
Wataru Nakata
Kazuki Yamauchi
Dong Yang
Hiroaki Hyodo
Yuki Saito
17
0
0
20 Mar 2024
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
Peng Liu
Dongyang Dai
Zhiyong Wu
16
2
0
08 Mar 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and
  Diffusion Models
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju
Yuancheng Wang
Kai Shen
Xu Tan
Detai Xin
...
Shikun Zhang
Jiang Bian
Lei He
Jinyu Li
Sheng Zhao
DiffM
22
139
0
05 Mar 2024
Language-Codec: Reducing the Gaps Between Discrete Codec Representation
  and Speech Language Models
Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models
Shengpeng Ji
Minghui Fang
Ziyue Jiang
Siqi Zheng
Qian Chen
Rongjie Huang
Jialung Zuo
Shulei Wang
Zhou Zhao
AuLLM
22
14
0
19 Feb 2024
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible
  recipes, self-supervised front-ends, and off-the-shelf models
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
Jee-weon Jung
Wangyou Zhang
Jiatong Shi
Zakaria Aldeneh
Takuya Higuchi
B. Theobald
Ahmed Hussen Abdelaziz
Shinji Watanabe
52
21
0
30 Jan 2024
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech
  Generation Leveraging NLP Evaluation Metrics
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics
Takaaki Saeki
Soumi Maiti
Shinnosuke Takamichi
Shinji Watanabe
Hiroshi Saruwatari
14
11
0
30 Jan 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
Hyoung-Seok Oh
Sang-Hoon Lee
Deok-Hyun Cho
Seong-Whan Lee
28
1
0
16 Jan 2024
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis
  Conditioned on Self-supervised Discrete Speech Representations
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
17
20
0
22 Dec 2023
HierSpeech++: Bridging the Gap between Semantic and Acoustic
  Representation of Speech by Hierarchical Variational Inference for Zero-shot
  Speech Synthesis
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Sang-Hoon Lee
Haram Choi
Seung-Bin Kim
Seong-Whan Lee
BDL
17
31
0
21 Nov 2023
A Comparative Study of Voice Conversion Models with Large-Scale Speech
  and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge
  2023
A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023
Ryuichi Yamamoto
Reo Yoneyama
Lester Phillip Violeta
Wen-Chin Huang
T. Toda
8
7
0
08 Oct 2023
Partial Rank Similarity Minimization Method for Quality MOS Prediction
  of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting
Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting
Hemant Yadav
Erica Cooper
Junichi Yamagishi
Sunayana Sitaram
R. Shah
6
0
0
08 Oct 2023
A Study on Incorporating Whisper for Robust Speech Assessment
A Study on Incorporating Whisper for Robust Speech Assessment
Ryandhimas E. Zezario
Yu-Wen Chen
Szu-Wei Fu
Yu Tsao
H. Wang
C. Fuh
25
10
0
22 Sep 2023
Exploring Sentence Type Effects on the Lombard Effect and
  Intelligibility Enhancement: A Comparative Study of Natural and Grid
  Sentences
Exploring Sentence Type Effects on the Lombard Effect and Intelligibility Enhancement: A Comparative Study of Natural and Grid Sentences
Hongyang Chen
Yuhong Yang
Qingmu Liu
Weiping Tu
Baifeng Li
Song Lin
11
0
0
19 Sep 2023
Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using
  Whisper and Metadata
Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata
Ryandhimas E. Zezario
Fei Chen
C. Fuh
H. Wang
Yu Tsao
27
1
0
18 Sep 2023
Diversity-based core-set selection for text-to-speech with linguistic
  and acoustic features
Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
16
3
0
15 Sep 2023
Previous
123
Next