v1v2 (latest)

Direct speech-to-speech translation with discrete units

12 July 2021

Yossi Adi

ArXiv (abs)PDF HTML Github (32206★)

Papers citing "Direct speech-to-speech translation with discrete units"

50 / 144 papers shown

RosettaSpeech: Zero-Shot Speech-to-Speech Translation without Parallel Speech

...

179

26 Nov 2025

Improving Direct Persian-English Speech-to-Speech Translation with Discrete Units and Synthetic Parallel DataPhysical Review X (PRX), 2025

Sina Rashidi

Hossein Sameti

120

16 Nov 2025

StressTransfer: Stress-Aware Speech-to-Speech Translation with Emphasis Preservation

Xi Chen

Yuchen Song

Satoshi Nakamura

138

15 Oct 2025

MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-token Prediction

147

11 Oct 2025

UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice

199

25 Sep 2025

Speech Vecalign: an Embedding-based Method for Aligning Parallel Speech Documents

Chutong Meng

Philipp Koehn

149

22 Sep 2025

GmSLM : Generative Marmoset Spoken Language Modeling

238

11 Sep 2025

Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data

245

23 Jul 2025

Factorized RVQ-GAN For Disentangled Speech Tokenization

...

264

18 Jun 2025

Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs

292

12 Jun 2025

Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing

Jeongsoo Choi

Jaehun Kim

Joon Son Chung

333

27 May 2025

Textless and Non-Parallel Speech-to-Speech Emotion Style Transfer

Soumya Dutta

Avni Jain

Sriram Ganapathy

317

23 May 2025

Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

251

21 May 2025

Spatial Speech Translation: Translating Across Space With Binaural HearablesInternational Conference on Human Factors in Computing Systems (CHI), 2025

251

25 Apr 2025

SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

378

22 Apr 2025

On The Landscape of Spoken Language Models: A Comprehensive Survey

475

106

11 Apr 2025

Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained RepresentationsIEEE Journal on Selected Topics in Signal Processing (JSTSP), 2024

421

15 Mar 2025

DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And IntelligibilityNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

Yifan Liu

Yu Fang

Zhouhan Lin

321

07 Mar 2025

Speech to Speech Translation with Translatotron: A State of the Art Review

597

21 Feb 2025

High-Fidelity Simultaneous Speech-To-Speech Translation

1.1K

05 Feb 2025

When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation

423

01 Feb 2025

A Unit-based System and Dataset for Expressive Direct Speech-to-Speech TranslationInterspeech (Interspeech), 2024

512

01 Feb 2025

Discrete Speech Unit Extraction via Independent Component Analysis

277

11 Jan 2025

Improving Lip-synchrony in Direct Audio-Visual Speech-to-Speech TranslationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Chandrashekhar Lavania

Srikanth Vishnubhotla

Lijia Sun

Anthony Ferritto

340

21 Dec 2024

Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech RecognitionSpoken Language Technology Workshop (SLT), 2024

337

11 Nov 2024

Towards Building Large Scale Datasets and State-of-the-Art Automatic Speech Translation Systems for 14 Indian Languages

Mohammed Safi Ur Rahman Khan

Anoop Kunchukuttan

Mitesh M. Khapra

Mary Dabre

579

07 Nov 2024

DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models

367

31 Oct 2024

Phonology-Guided Speech-to-Speech Translation for African LanguagesSpeech Communication (Speech Commun.), 2024

P. Ochieng

D. Kaburu

376

30 Oct 2024

Enhancing TTS Stability in Hebrew using Discrete Semantic UnitsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Ella Zeldes

Or Tal

Yossi Adi

222

28 Oct 2024

Do Discrete Self-Supervised Representations of Speech Capture Tone Distinctions?

Opeyemi Osakuade

Simon King

269

25 Oct 2024

IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities

Xin Zhang

Xiang Lyu

Zhihao Du

Qian Chen

Dong Zhang

...

Yuxuan Wang

384

09 Oct 2024

Sylber: Syllabic Embedding Representation of Speech from Raw AudioInternational Conference on Learning Representations (ICLR), 2024

Cheol Jun Cho

Nicholas Lee

Akshat Gupta

Dhruv Agarwal

Ethan Chen

Alan W Black

Gopala K. Anumanchipalli

334

09 Oct 2024

Accent conversion using discrete units with parallel data synthesized from controllable accented TTS

Tuan Nam Nguyen

Ngoc-Quan Pham

A. Waibel

239

30 Sep 2024

Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New ParadigmACM Multimedia (MM), 2024

Yuning Wu

Jiatong Shi

Shinji Watanabe

293

11 Sep 2024

Estimating the Completeness of Discrete Speech UnitsSpoken Language Technology Workshop (SLT), 2024

Sung-Lin Yeh

Hao Tang

407

09 Sep 2024

LAST: Language Model Aware Speech Tokenization

A. Turetzky

Yossi Adi

404

05 Sep 2024

SpeechPrompt: Prompting Speech Language Models for Speech Processing TasksIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2024

Kai-Wei Chang

Haibin Wu

Yu-Kai Wang

Hung-yi Lee

263

23 Aug 2024

PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform GenerationInternational Conference on Learning Representations (ICLR), 2024

Sang-Hoon Lee

Ha-Yeong Choi

Seong-Whan Lee

OOD DiffM AI4TS

400

14 Aug 2024

Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation

J. Duret

Yannick Esteve

Titouan Parcollet

243

08 Jul 2024

NAST: Noise Aware Speech Tokenization for Speech Language Models

Shoval Messica

Yossi Adi

279

16 Jun 2024

MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning ModelInterspeech (Interspeech), 2024

Jiatong Shi

Xutai Ma

Hirofumi Inaguma

Anna Y. Sun

Shinji Watanabe

249

14 Jun 2024

ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis

Xiao Chen

277

13 Jun 2024

SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models

Yuxun Tang

Yuning Wu

Jiatong Shi

Qin Jin

304

13 Jun 2024

Cognitively Inspired Energy-Based World Models

277

13 Jun 2024

TokSing: Singing Voice Synthesis based on Discrete Tokens

Jiatong Shi

328

12 Jun 2024

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units

Xuankai Chang

Jiatong Shi

Jinchuan Tian

Yuning Wu

Yuxun Tang

Yihan Wu

Shinji Watanabe

Yossi Adi

Xie Chen

Qin Jin

267

11 Jun 2024

CTC-based Non-autoregressive Textless Speech-to-Speech Translation

Yang Feng

295

11 Jun 2024

Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?

Shaolei Zhang

Yang Feng

244

11 Jun 2024

A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation

Zhengrui Ma

Qingkai Fang

Shaolei Zhang

Shoutao Guo

Yang Feng

Min Zhang

297

11 Jun 2024

Exploring the Benefits of Tokenization of Discrete Acoustic UnitsInterspeech (Interspeech), 2024

Avihu Dekel

Raul Fernandez

276

08 Jun 2024