ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.02152
  4. Cited By
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022

UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022

5 April 2022
Takaaki Saeki
Detai Xin
Wataru Nakata
Tomoki Koriyama
Shinnosuke Takamichi
Hiroshi Saruwatari
ArXivPDFHTML

Papers citing "UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022"

16 / 116 papers shown
Title
RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic
  Weighting
RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting
Haibo Wang
Shiwan Zhao
Xiguang Zheng
Yong Qin
13
11
0
31 Aug 2023
Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality
  Assessment Model
Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model
Ryandhimas E. Zezario
B. Bai
C. Fuh
Hsin-Min Wang
Yu Tsao
11
3
0
18 Aug 2023
The Singing Voice Conversion Challenge 2023
The Singing Voice Conversion Challenge 2023
Wen-Chin Huang
Lester Phillip Violeta
Songxiang Liu
Jiatong Shi
T. Toda
8
45
0
26 Jun 2023
MOSPC: MOS Prediction Based on Pairwise Comparison
MOSPC: MOS Prediction Based on Pairwise Comparison
Kexin Wang
Yunlong Zhao
Qianqian Dong
Tom Ko
Mingxuan Wang
11
5
0
18 Jun 2023
Vocos: Closing the gap between time-domain and Fourier-based neural
  vocoders for high-quality audio synthesis
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Hubert Siuzdak
17
75
0
01 Jun 2023
How Generative Spoken Language Modeling Encodes Noisy Speech:
  Investigation from Phonetics to Syntactics
How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics
Joonyong Park
Shinnosuke Takamichi
Tomohiko Nakamura
Kentaro Seki
Detai Xin
Hiroshi Saruwatari
AuLLM
9
3
0
01 Jun 2023
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with
  Unsupervised Text Pretraining
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Takaaki Saeki
Soumi Maiti
Xinjian Li
Shinji Watanabe
Shinnosuke Takamichi
Hiroshi Saruwatari
19
17
0
30 Jan 2023
MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module
MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module
Ondvrej Plátek
Ondrej Dusek
11
2
0
17 Jan 2023
SpeechLMScore: Evaluating speech generation using speech language model
SpeechLMScore: Evaluating speech generation using speech language model
Soumi Maiti
Yifan Peng
Takaaki Saeki
Shinji Watanabe
ALM
11
30
0
08 Dec 2022
Iterative autoregression: a novel trick to improve your low-latency
  speech enhancement model
Iterative autoregression: a novel trick to improve your low-latency speech enhancement model
Pavel Andreev
Nicholas Babaev
Azat Saginbaev
Ivan Shchekotov
Aibek Alanov
6
4
0
03 Nov 2022
Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using
  Prosodic and Linguistic Features
Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features
Alexandra Vioni
Georgia Maniati
Nikolaos Ellinas
June Sig Sung
Inchul Hwang
Aimilios Chalamandaris
Pirros Tsiakoulis
11
5
0
01 Nov 2022
Text-to-speech synthesis from dark data with evaluation-in-the-loop data
  selection
Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
11
6
0
26 Oct 2022
SQuId: Measuring Speech Naturalness in Many Languages
SQuId: Measuring Speech Naturalness in Many Languages
Thibault Sellam
Ankur Bapna
Joshua Camp
Diana Mackinnon
Ankur P. Parikh
Jason Riesa
22
17
0
12 Oct 2022
Exploring the Effectiveness of Self-supervised Learning and Classifier
  Chains in Emotion Recognition of Nonverbal Vocalizations
Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations
Detai Xin
Shinnosuke Takamichi
Hiroshi Saruwatari
12
14
0
21 Jun 2022
Simple and Effective Zero-shot Cross-lingual Phoneme Recognition
Simple and Effective Zero-shot Cross-lingual Phoneme Recognition
Qiantong Xu
Alexei Baevski
Michael Auli
VLM
20
77
0
23 Sep 2021
VoiceGrad: Non-Parallel Any-to-Many Voice Conversion with Annealed
  Langevin Dynamics
VoiceGrad: Non-Parallel Any-to-Many Voice Conversion with Annealed Langevin Dynamics
Hirokazu Kameoka
Takuhiro Kaneko
Kou Tanaka
Nobukatsu Hojo
Shogo Seki
DiffM
15
21
0
06 Oct 2020
Previous
123