ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.02979
50
0

Towards a Japanese Full-duplex Spoken Dialogue System

3 June 2025
Atsumoto Ohashi
Shinya Iizuka
Jingjing Jiang
Ryuichiro Higashinaka
    AuLLM
ArXiv (abs)PDFHTML
Main:4 Pages
4 Figures
Bibliography:1 Pages
4 Tables
Abstract

Full-duplex spoken dialogue systems, which can model simultaneous bidirectional features of human conversations such as speech overlaps and backchannels, have attracted significant attention recently. However, the study of full-duplex spoken dialogue systems for the Japanese language has been limited, and the research on their development in Japanese remains scarce. In this paper, we present the first publicly available full-duplex spoken dialogue model in Japanese, which is built upon Moshi, a full-duplex dialogue model in English. Our model is trained through a two-stage process: pre-training on a large-scale spoken dialogue data in Japanese, followed by fine-tuning on high-quality stereo spoken dialogue data. We further enhance the model's performance by incorporating synthetic dialogue data generated by a multi-stream text-to-speech system. Evaluation experiments demonstrate that the trained model outperforms Japanese baseline models in both naturalness and meaningfulness.

View on arXiv
@article{ohashi2025_2506.02979,
  title={ Towards a Japanese Full-duplex Spoken Dialogue System },
  author={ Atsumoto Ohashi and Shinya Iizuka and Jingjing Jiang and Ryuichiro Higashinaka },
  journal={arXiv preprint arXiv:2506.02979},
  year={ 2025 }
}
Comments on this paper