ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.05604
  4. Cited By
Direct speech-to-speech translation with discrete units
v1v2 (latest)

Direct speech-to-speech translation with discrete units

12 July 2021
Ann Lee
Peng-Jen Chen
Changhan Wang
Jiatao Gu
Sravya Popuri
Xutai Ma
Adam Polyak
Yossi Adi
Qing He
Yun Tang
J. Pino
Wei-Ning Hsu
ArXiv (abs)PDFHTMLGithub (32206★)

Papers citing "Direct speech-to-speech translation with discrete units"

50 / 144 papers shown
RosettaSpeech: Zero-Shot Speech-to-Speech Translation without Parallel Speech
RosettaSpeech: Zero-Shot Speech-to-Speech Translation without Parallel Speech
Zhisheng Zheng
Xiaohang Sun
Tuan Dinh
Abhishek Yanamandra
Abhinav Jain
...
Sunil Hadap
Vimal Bhat
Manoj Aggarwal
Gérard Medioni
David Harwath
179
0
0
26 Nov 2025
Improving Direct Persian-English Speech-to-Speech Translation with Discrete Units and Synthetic Parallel Data
Improving Direct Persian-English Speech-to-Speech Translation with Discrete Units and Synthetic Parallel DataPhysical Review X (PRX), 2025
Sina Rashidi
Hossein Sameti
120
0
0
16 Nov 2025
StressTransfer: Stress-Aware Speech-to-Speech Translation with Emphasis Preservation
StressTransfer: Stress-Aware Speech-to-Speech Translation with Emphasis Preservation
Xi Chen
Yuchen Song
Satoshi Nakamura
138
0
0
15 Oct 2025
MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-token Prediction
MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-token Prediction
Jianjin Wang
Runsong Zhao
Xiaoqian Liu
Yuan Ge
Ziqiang Xu
Tong Xiao
Shengxiang Gao
Z. Yu
Jingbo Zhu
147
1
0
11 Oct 2025
UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice
UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice
Sitong Cheng
Weizhen Bian
Xinsheng Wang
Ruibin Yuan
Jianyi Chen
Shunshun Yin
Wenhan Luo
Wei Xue
199
0
0
25 Sep 2025
Speech Vecalign: an Embedding-based Method for Aligning Parallel Speech Documents
Speech Vecalign: an Embedding-based Method for Aligning Parallel Speech Documents
Chutong Meng
Philipp Koehn
149
0
0
22 Sep 2025
GmSLM : Generative Marmoset Spoken Language Modeling
GmSLM : Generative Marmoset Spoken Language Modeling
Talia Sternberg
Michael London
David Omer
Yossi Adi
AuLLM
238
0
0
11 Sep 2025
Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data
Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data
Qibing Bai
Sho Inoue
Shuai Wang
Zhongjie Jiang
Yannan Wang
Haizhou Li
245
3
0
23 Jul 2025
Factorized RVQ-GAN For Disentangled Speech Tokenization
Factorized RVQ-GAN For Disentangled Speech Tokenization
Sameer Khurana
Dominik Klement
Antoine Laurent
Dominik Bobos
Juraj Novosad
...
Ryo Aihara
Chiori Hori
François Germain
Gordon Wichern
Jonathan Le Roux
264
1
0
18 Jun 2025
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
Hayato Futami
E. Tsunoo
Yosuke Kashiwagi
Yuki Ito
Hassan Shahmohammadi
Siddhant Arora
Shinji Watanabe
AuLLM
292
2
0
12 Jun 2025
Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Jeongsoo Choi
Jaehun Kim
Joon Son Chung
333
0
0
27 May 2025
Textless and Non-Parallel Speech-to-Speech Emotion Style Transfer
Textless and Non-Parallel Speech-to-Speech Emotion Style Transfer
Soumya Dutta
Avni Jain
Sriram Ganapathy
317
0
0
23 May 2025
Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech Translation
Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yuhao Zhang
Xiangnan Ma
Kaiqi Kou
Peizhuo Liu
Weiqiao Shan
Benyou Wang
Tong Xiao
Yuxin Huang
Zhengtao Yu
Jingbo Zhu
VLM
251
1
0
21 May 2025
Spatial Speech Translation: Translating Across Space With Binaural Hearables
Spatial Speech Translation: Translating Across Space With Binaural HearablesInternational Conference on Human Factors in Computing Systems (CHI), 2025
Tuochao Chen
Qirui Wang
Runlin He
Shyam Gollakota
251
5
0
25 Apr 2025
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Keqi Deng
Wenxi Chen
Xie Chen
P. Woodland
378
4
0
22 Apr 2025
On The Landscape of Spoken Language Models: A Comprehensive Survey
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
475
106
0
11 Apr 2025
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained RepresentationsIEEE Journal on Selected Topics in Signal Processing (JSTSP), 2024
Xue Jiang
Xiulian Peng
Yuan Zhang
Yan Lu
SSL
421
6
0
15 Mar 2025
DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And Intelligibility
DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And IntelligibilityNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Yifan Liu
Yu Fang
Zhouhan Lin
321
4
0
07 Mar 2025
Speech to Speech Translation with Translatotron: A State of the Art Review
Speech to Speech Translation with Translatotron: A State of the Art Review
Jules R. Kala
Emmanuel Adetiba
Abdultaofeek Abayom
Oluwatobi E. Dare
Ayodele H. Ifijeh
597
0
0
21 Feb 2025
High-Fidelity Simultaneous Speech-To-Speech Translation
High-Fidelity Simultaneous Speech-To-Speech Translation
Tom Labiausse
Laurent Mazaré
Edouard Grave
P. Pérez
Alexandre Défossez
Neil Zeghidour
1.1K
21
0
05 Feb 2025
When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation
When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation
Anna Min
Chenxu Hu
Yi Ren
Hang Zhao
423
5
0
01 Feb 2025
A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation
A Unit-based System and Dataset for Expressive Direct Speech-to-Speech TranslationInterspeech (Interspeech), 2024
Anna Min
Chenxu Hu
Yi Ren
Hang Zhao
512
0
0
01 Feb 2025
Discrete Speech Unit Extraction via Independent Component Analysis
Discrete Speech Unit Extraction via Independent Component Analysis
Tomohiko Nakamura
Kwanghee Choi
Keigo Hojo
Yoshiaki Bando
Satoru Fukayama
Shinji Watanabe
277
4
0
11 Jan 2025
Improving Lip-synchrony in Direct Audio-Visual Speech-to-Speech
  Translation
Improving Lip-synchrony in Direct Audio-Visual Speech-to-Speech TranslationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Lucas Goncalves
Prashant Mathur
Xing Niu
Brady Houston
Chandrashekhar Lavania
Srikanth Vishnubhotla
Lijia Sun
Anthony Ferritto
340
2
0
21 Dec 2024
Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for
  Speech Recognition
Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech RecognitionSpoken Language Technology Workshop (SLT), 2024
Yoshiki Masuyama
Koichi Miyazaki
Masato Murata
Mamba
337
7
0
11 Nov 2024
Towards Building Large Scale Datasets and State-of-the-Art Automatic Speech Translation Systems for 14 Indian Languages
Towards Building Large Scale Datasets and State-of-the-Art Automatic Speech Translation Systems for 14 Indian Languages
Sparsh Jain
Ashwin Sankar
Devilal Choudhary
Dhairya Suman
Nikhil Narasimhan
Mohammed Safi Ur Rahman Khan
Anoop Kunchukuttan
Mitesh M. Khapra
Mary Dabre
579
2
0
07 Nov 2024
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
Heng-Jui Chang
Hongyu Gong
Changhan Wang
James R. Glass
Yu-An Chung
367
6
0
31 Oct 2024
Phonology-Guided Speech-to-Speech Translation for African Languages
Phonology-Guided Speech-to-Speech Translation for African LanguagesSpeech Communication (Speech Commun.), 2024
P. Ochieng
D. Kaburu
376
0
0
30 Oct 2024
Enhancing TTS Stability in Hebrew using Discrete Semantic Units
Enhancing TTS Stability in Hebrew using Discrete Semantic UnitsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Ella Zeldes
Or Tal
Yossi Adi
222
3
0
28 Oct 2024
Do Discrete Self-Supervised Representations of Speech Capture Tone
  Distinctions?
Do Discrete Self-Supervised Representations of Speech Capture Tone Distinctions?
Opeyemi Osakuade
Simon King
269
3
0
25 Oct 2024
IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice
  Interaction Abilities
IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities
Xin Zhang
Xiang Lyu
Zhihao Du
Qian Chen
Dong Zhang
...
Yuxuan Wang
Bin Zhang
Heng Lu
Yaqian Zhou
Jiaqi Leng
AuLLM
384
16
0
09 Oct 2024
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Sylber: Syllabic Embedding Representation of Speech from Raw AudioInternational Conference on Learning Representations (ICLR), 2024
Cheol Jun Cho
Nicholas Lee
Akshat Gupta
Dhruv Agarwal
Ethan Chen
Alan W Black
Gopala K. Anumanchipalli
334
20
0
09 Oct 2024
Accent conversion using discrete units with parallel data synthesized
  from controllable accented TTS
Accent conversion using discrete units with parallel data synthesized from controllable accented TTS
Tuan Nam Nguyen
Ngoc-Quan Pham
A. Waibel
239
5
0
30 Sep 2024
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in
  New Paradigm
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New ParadigmACM Multimedia (MM), 2024
Yuning Wu
Jiatong Shi
Yifeng Yu
Yuxun Tang
Tao Qian
Yueqian Lin
Jionghao Han
Xinyi Bai
Shinji Watanabe
Qin Jin
293
7
0
11 Sep 2024
Estimating the Completeness of Discrete Speech Units
Estimating the Completeness of Discrete Speech UnitsSpoken Language Technology Workshop (SLT), 2024
Sung-Lin Yeh
Hao Tang
407
9
0
09 Sep 2024
LAST: Language Model Aware Speech Tokenization
LAST: Language Model Aware Speech Tokenization
A. Turetzky
Yossi Adi
404
9
0
05 Sep 2024
SpeechPrompt: Prompting Speech Language Models for Speech Processing
  Tasks
SpeechPrompt: Prompting Speech Language Models for Speech Processing TasksIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2024
Kai-Wei Chang
Haibin Wu
Yu-Kai Wang
Yuan-Kuei Wu
Hua Shen
Wei-Cheng Tseng
Iu-thing Kang
Shang-Wen Li
Hung-yi Lee
263
14
0
23 Aug 2024
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform
  Generation
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform GenerationInternational Conference on Learning Representations (ICLR), 2024
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
OODDiffMAI4TS
400
17
0
14 Aug 2024
Analyzing Speech Unit Selection for Textless Speech-to-Speech
  Translation
Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation
J. Duret
Yannick Esteve
Titouan Parcollet
243
0
0
08 Jul 2024
NAST: Noise Aware Speech Tokenization for Speech Language Models
NAST: Noise Aware Speech Tokenization for Speech Language Models
Shoval Messica
Yossi Adi
279
13
0
16 Jun 2024
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech
  Representation from Self-supervised Learning Model
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning ModelInterspeech (Interspeech), 2024
Jiatong Shi
Xutai Ma
Hirofumi Inaguma
Anna Y. Sun
Shinji Watanabe
249
17
0
14 Jun 2024
ToneUnit: A Speech Discretization Approach for Tonal Language Speech
  Synthesis
ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
Dehua Tao
Daxin Tan
Y. Yeung
Xiao Chen
Tan Lee
277
8
0
13 Jun 2024
SingOMD: Singing Oriented Multi-resolution Discrete Representation
  Construction from Speech Models
SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models
Yuxun Tang
Yuning Wu
Jiatong Shi
Qin Jin
304
7
0
13 Jun 2024
Cognitively Inspired Energy-Based World Models
Cognitively Inspired Energy-Based World Models
Alexi Gladstone
Ganesh Nanduru
Md. Mofijul Islam
Vasu Sharma
Jundong Li
Tariq Iqbal
277
0
0
13 Jun 2024
TokSing: Singing Voice Synthesis based on Discrete Tokens
TokSing: Singing Voice Synthesis based on Discrete Tokens
Yuning Wu
Chunlei Zhang
Jiatong Shi
Yuxun Tang
Shan Yang
Qin Jin
328
15
0
12 Jun 2024
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Xuankai Chang
Jiatong Shi
Jinchuan Tian
Yuning Wu
Yuxun Tang
Yihan Wu
Shinji Watanabe
Yossi Adi
Xie Chen
Qin Jin
267
31
0
11 Jun 2024
CTC-based Non-autoregressive Textless Speech-to-Speech Translation
CTC-based Non-autoregressive Textless Speech-to-Speech Translation
Qingkai Fang
Zhengrui Ma
Yan Zhou
Min Zhang
Yang Feng
295
4
0
11 Jun 2024
Can We Achieve High-quality Direct Speech-to-Speech Translation without
  Parallel Speech Data?
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?
Qingkai Fang
Shaolei Zhang
Zhengrui Ma
Min Zhang
Yang Feng
VLM
244
13
0
11 Jun 2024
A Non-autoregressive Generation Framework for End-to-End Simultaneous
  Speech-to-Any Translation
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation
Zhengrui Ma
Qingkai Fang
Shaolei Zhang
Shoutao Guo
Yang Feng
Min Zhang
297
20
0
11 Jun 2024
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Exploring the Benefits of Tokenization of Discrete Acoustic UnitsInterspeech (Interspeech), 2024
Avihu Dekel
Raul Fernandez
276
3
0
08 Jun 2024
123
Next
Page 1 of 3