ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08352
  4. Cited By
MOSNet: Deep Learning based Objective Assessment for Voice Conversion
v1v2v3 (latest)

MOSNet: Deep Learning based Objective Assessment for Voice Conversion

17 April 2019
Chen-Chou Lo
Szu-Wei Fu
Wen-Chin Huang
Xin Wang
Junichi Yamagishi
Yu Tsao
H. Wang
ArXiv (abs)PDFHTML

Papers citing "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"

50 / 140 papers shown
Title
Watermarking Autoregressive Image Generation
Watermarking Autoregressive Image Generation
Nikola Jovanović
Ismail Labiad
Tomáš Souček
Martin Vechev
Pierre Fernandez
WIGM
40
0
0
19 Jun 2025
TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data
TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
36
0
0
18 Jun 2025
A Study on Speech Assessment with Visual Cues
Shafique Ahmed
Ryandhimas E. Zezario
Nasir Saleem
Amir Hussain
H. Wang
Yu Tsao
67
0
0
11 Jun 2025
WhisQ: Cross-Modal Representation Learning for Text-to-Music MOS Prediction
WhisQ: Cross-Modal Representation Learning for Text-to-Music MOS Prediction
Jakaria Islam Emon
Kazi Tamanna Alam
Md Abu Salek
27
0
0
06 Jun 2025
SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction
SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction
Saurabh Agrawal
Raj Gohil
Gopal Kumar Agrawal
Vikram C M
Kushal Verma
31
0
0
02 Jun 2025
ARECHO: Autoregressive Evaluation via Chain-Based Hypothesis Optimization for Speech Multi-Metric Estimation
ARECHO: Autoregressive Evaluation via Chain-Based Hypothesis Optimization for Speech Multi-Metric Estimation
Jiatong Shi
Yifan Cheng
Bo-Hao Su
Hye-jin Shim
Jinchuan Tian
Samuele Cornell
Yiwen Zhao
Siddhant Arora
Shinji Watanabe
57
0
0
30 May 2025
SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit
SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit
Wen-Chin Huang
Erica Cooper
Tomoki Toda
103
1
0
21 May 2025
Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning
Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning
Junchuan Zhao
Xintong Wang
Ye Wang
38
0
0
21 May 2025
APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech
APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech
Zhicheng Lian
Lizhi Wang
Hua Huang
76
1
0
29 Apr 2025
On The Landscape of Spoken Language Models: A Comprehensive Survey
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
155
14
0
11 Apr 2025
QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
Siyin Wang
Wenyi Yu
Xianzhao Chen
Xiaohai Tian
Jing Zhang
Lu Lu
Yu Tsao
Junichi Yamagishi
Yansen Wang
Chao Zhang
AuLLM
162
2
0
26 Mar 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Ji-Hoon Kim
Jeongsoo Choi
Jaehun Kim
Chaeyoung Jung
Joon Son Chung
CVBM
80
1
0
21 Mar 2025
Vision-Speech Models: Teaching Speech Models to Converse about Images
Vision-Speech Models: Teaching Speech Models to Converse about Images
Amélie Royer
Moritz Böhle
Gabriel de Marmiesse
Laurent Mazaré
Neil Zeghidour
Alexandre Défossez
P. Pérez
AuLLMVLM
127
0
0
19 Mar 2025
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Julian Parker
Anton Smirnov
Jordi Pons
CJ Carr
Zack Zukowski
Zach Evans
Xubo Liu
123
16
0
29 Nov 2024
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot
  TTS and LLM
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM
Jiawei Yu
Yongqian Li
Xiaosong Qiao
Huan Zhao
Xiaofeng Zhao
Wei Tang
Hao Fei
Hao Yang
Jinsong Su
130
0
0
20 Nov 2024
SAMOS: A Neural MOS Prediction Model Leveraging Semantic Representations and Acoustic Features
Yu-Fei Shi
Yang Ai
Ye-Xin Lu
Hui-Peng Du
Zhen-Hua Ling
81
1
0
18 Nov 2024
Pitch-and-Spectrum-Aware Singing Quality Assessment with Bias Correction and Model Fusion
Yu-Fei Shi
Yang Ai
Ye-Xin Lu
Hui-Peng Du
Zhen-Hua Ling
63
0
0
17 Nov 2024
MOS-Bench: Benchmarking Generalization Abilities of Subjective Speech
  Quality Assessment Models
MOS-Bench: Benchmarking Generalization Abilities of Subjective Speech Quality Assessment Models
Wen-Chin Huang
Erica Cooper
Tomoki Toda
78
10
0
06 Nov 2024
Continuous Speech Tokenizer in Text To Speech
Continuous Speech Tokenizer in Text To Speech
Yixing Li
Ruobing Xie
Xingwu Sun
Yu Cheng
Zhanhui Kang
AuLLMCLL
128
2
0
22 Oct 2024
Optimal Transport Maps are Good Voice Converters
Optimal Transport Maps are Good Voice Converters
Arip Asadulaev
Rostislav Korst
V. Shutov
Alexander Korotin
Yaroslav Grebnyak
Vahe Egiazarian
Evgeny Burnaev
OT
55
2
0
17 Oct 2024
Enhancing Crowdsourced Audio for Text-to-Speech Models
Enhancing Crowdsourced Audio for Text-to-Speech Models
José Giraldo
Martí Llopart-Font
Alex Peiró-Lilja
Carme Armentano-Oller
Gerard Sant
Baybars Külebi
DiffM
62
0
0
17 Oct 2024
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Siyin Wang
Wenyi Yu
Yudong Yang
Changli Tang
Yixuan Li
...
Jun Zhang
Guangzhi Sun
Lu Lu
Yuxuan Wang
Chao Zhang
AuLLMLM&MA
133
8
0
25 Sep 2024
Self-Supervised Audio-Visual Soundscape Stylization
Self-Supervised Audio-Visual Soundscape Stylization
Tingle Li
Renhao Wang
Po-Yao Huang
Andrew Owens
Gopala Anumanchipalli
DiffMSSL
100
5
0
22 Sep 2024
The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from
  Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic
  Speech
The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
Kaito Baba
Wataru Nakata
Yuki Saito
Hiroshi Saruwatari
VLM
106
17
0
14 Sep 2024
Measuring the Accuracy of Automatic Speech Recognition Solutions
Measuring the Accuracy of Automatic Speech Recognition Solutions
Korbinian Kuhn
Verena Kersken
Benedikt Reuter
Niklas Egger
Gottfried Zimmermann
66
22
0
29 Aug 2024
Direction of Arrival Correction through Speech Quality Feedback
Direction of Arrival Correction through Speech Quality Feedback
Caleb Rascon
25
0
0
13 Aug 2024
UNQA: Unified No-Reference Quality Assessment for Audio, Image, Video,
  and Audio-Visual Content
UNQA: Unified No-Reference Quality Assessment for Audio, Image, Video, and Audio-Visual Content
Yuhang Cao
Xiongkuo Min
Yixuan Gao
Wei Sun
Weisi Lin
Guangtao Zhai
74
2
0
29 Jul 2024
Speech Editing -- a Summary
Speech Editing -- a Summary
Tobias Kässmann
Yining Liu
Danni Liu
65
1
0
24 Jul 2024
Towards Robust Speech Representation Learning for Thousands of Languages
Towards Robust Speech Representation Learning for Thousands of Languages
William Chen
Wangyou Zhang
Yifan Peng
Xinjian Li
Jinchuan Tian
Jiatong Shi
Xuankai Chang
Soumi Maiti
Karen Livescu
Shinji Watanabe
ELM
137
19
0
30 Jun 2024
SingMOS: An extensive Open-Source Singing Voice Dataset for MOS
  Prediction
SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction
Yuxun Tang
Jiatong Shi
Yuning Wu
Qin Jin
81
11
0
16 Jun 2024
Brilla AI: AI Contestant for the National Science and Maths Quiz
Brilla AI: AI Contestant for the National Science and Maths Quiz
George Boateng
Jonathan Abrefah Mensah
Kevin Takyi Yeboah
William Edor
Andrew Kojo Mensah-Onumah
Naafi Dasana Ibrahim
Nana Sam Yeboah
48
2
0
04 Mar 2024
Towards Environmental Preference Based Speech Enhancement For
  Individualised Multi-Modal Hearing Aids
Towards Environmental Preference Based Speech Enhancement For Individualised Multi-Modal Hearing Aids
Jasper Kirton-Wingate
Shafique Ahmed
Adeel Hussain
M. Gogate
K. Dashtipour
Jen-Cheng Hou
Tassadaq Hussain
Yu Tsao
Amir Hussain
55
0
0
26 Feb 2024
Self-Supervised Speech Quality Estimation and Enhancement Using Only
  Clean Speech
Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech
Szu-Wei Fu
Kuo-Hsuan Hung
Yu Tsao
Yu-Chiang Frank Wang
SSL
76
13
0
26 Feb 2024
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Minsu Kim
Jee-weon Jung
Hyeongseop Rha
Soumi Maiti
Siddhant Arora
Xuankai Chang
Shinji Watanabe
Y. Ro
102
7
0
25 Feb 2024
PAM: Prompting Audio-Language Models for Audio Quality Assessment
PAM: Prompting Audio-Language Models for Audio Quality Assessment
Soham Deshmukh
Dareen Alharthi
Benjamin Elizalde
Hannes Gamper
Mahmoud Al Ismail
Rita Singh
Bhiksha Raj
Huaming Wang
96
13
0
01 Feb 2024
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
Yihan Wu
Soumi Maiti
Yifan Peng
Wangyou Zhang
Chenda Li
Yuyue Wang
Xihua Wang
Shinji Watanabe
Ruihua Song
80
4
0
31 Jan 2024
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech
  Generation Leveraging NLP Evaluation Metrics
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics
Takaaki Saeki
Soumi Maiti
Shinnosuke Takamichi
Shinji Watanabe
Hiroshi Saruwatari
90
27
0
30 Jan 2024
Multi-objective Non-intrusive Hearing-aid Speech Assessment Model
Multi-objective Non-intrusive Hearing-aid Speech Assessment Model
Hsin-Tien Chiang
Szu-Wei Fu
Hsin-Min Wang
Yu Tsao
John H. L. Hansen
77
4
0
15 Nov 2023
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer
  Learning
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning
Rishabh Jain
Peter Corcoran
46
0
0
07 Nov 2023
CORN: Co-Trained Full- And No-Reference Speech Quality Assessment
CORN: Co-Trained Full- And No-Reference Speech Quality Assessment
Pranay Manocha
Donald Williamson
Adam Finkelstein
42
1
0
13 Oct 2023
Partial Rank Similarity Minimization Method for Quality MOS Prediction
  of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting
Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting
Hemant Yadav
Erica Cooper
Junichi Yamagishi
Sunayana Sitaram
R. Shah
66
0
0
08 Oct 2023
Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Dareen Alharthi
Roshan S. Sharma
Hira Dhamyal
Soumi Maiti
Bhiksha Raj
Rita Singh
77
4
0
01 Oct 2023
It HAS to be Subjective: Human Annotator Simulation via Zero-shot
  Density Estimation
It HAS to be Subjective: Human Annotator Simulation via Zero-shot Density Estimation
Wen Wu
Jiajun He
Chuxu Zhang
P. Woodland
57
1
0
30 Sep 2023
A Study on Incorporating Whisper for Robust Speech Assessment
A Study on Incorporating Whisper for Robust Speech Assessment
Ryandhimas E. Zezario
Yu-Wen Chen
Szu-Wei Fu
Yu Tsao
H. Wang
C. Fuh
95
13
0
22 Sep 2023
Exploring Sentence Type Effects on the Lombard Effect and
  Intelligibility Enhancement: A Comparative Study of Natural and Grid
  Sentences
Exploring Sentence Type Effects on the Lombard Effect and Intelligibility Enhancement: A Comparative Study of Natural and Grid Sentences
Hongyang Chen
Yuhong Yang
Qingmu Liu
Weiping Tu
Baifeng Li
Song Lin
33
0
0
19 Sep 2023
Towards Practical and Efficient Image-to-Speech Captioning with
  Vision-Language Pre-training and Multi-modal Tokens
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Minsu Kim
J. Choi
Soumi Maiti
Jeong Hun Yeo
Shinji Watanabe
Y. Ro
VLM
83
6
0
15 Sep 2023
Improving Voice Conversion for Dissimilar Speakers Using Perceptual
  Losses
Improving Voice Conversion for Dissimilar Speakers Using Perceptual Losses
Suhita Ghosh
Yamini Sinha
Ingo Siegert
Sebastian Stober
64
1
0
15 Sep 2023
Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel
  Emotion-Preserving Voice Conversion
Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion
Suhita Ghosh
Arnab Das
Yamini Sinha
Ingo Siegert
Tim Polzehl
Sebastian Stober
56
4
0
14 Sep 2023
Voxtlm: unified decoder-only models for consolidating speech
  recognition/synthesis and speech/text continuation tasks
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Soumi Maiti
Yifan Peng
Shukjae Choi
Jee-weon Jung
Xuankai Chang
Shinji Watanabe
VLMAuLLM
125
69
0
14 Sep 2023
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
Yiwei Guo
Chenpeng Du
Ziyang Ma
Xie Chen
K. Yu
DiffM
105
47
0
10 Sep 2023
123
Next