v1v2 (latest)

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

Annual Meeting of the Association for Computational Linguistics (ACL), 2022

15 December 2022

Papers citing "UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units"

50 / 53 papers shown

RosettaSpeech: Zero-Shot Speech-to-Speech Translation without Parallel Speech

...

151

26 Nov 2025

Improving Direct Persian-English Speech-to-Speech Translation with Discrete Units and Synthetic Parallel DataPhysical Review X (PRX), 2025

Sina Rashidi

Hossein Sameti

114

16 Nov 2025

MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-token Prediction

144

11 Oct 2025

UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice

193

25 Sep 2025

Speech Vecalign: an Embedding-based Method for Aligning Parallel Speech Documents

Chutong Meng

Philipp Koehn

137

22 Sep 2025

PRIM: Towards Practical In-Image Multilingual Machine Translation

182

05 Sep 2025

End-to-End Speech Translation for Low-Resource Languages Using Weakly Labeled Data

188

19 Jun 2025

Advances in LLMs with Focus on Reasoning, Adaptability, Efficiency and Ethics

377

14 Jun 2025

Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs

287

12 Jun 2025

Exploring In-Image Machine Translation with Real-World BackgroundAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

223

21 May 2025

Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

249

21 May 2025

SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

372

22 Apr 2025

Scaling Analysis of Interleaved Speech-Text Language Models

489

03 Apr 2025

Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained RepresentationsIEEE Journal on Selected Topics in Signal Processing (JSTSP), 2024

415

15 Mar 2025

Speech to Speech Translation with Translatotron: A State of the Art Review

589

21 Feb 2025

High-Fidelity Simultaneous Speech-To-Speech Translation

1.1K

05 Feb 2025

Discrete Speech Unit Extraction via Independent Component Analysis

265

11 Jan 2025

Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?Conference on Machine Translation (WMT), 2024

195

31 Oct 2024

Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR

201

17 Oct 2024

Diffusion Synthesizer for Efficient Multilingual Speech to Speech TranslationInterspeech (Interspeech), 2024

Nameer Hirschkind

Xiao Yu

Joseph Liu

Eloi DuBois

...

189

14 Jun 2024

CTC-based Non-autoregressive Textless Speech-to-Speech Translation

Yang Feng

289

11 Jun 2024

Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?

Shaolei Zhang

Yang Feng

242

11 Jun 2024

A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation

Zhengrui Ma

Qingkai Fang

Shaolei Zhang

Shoutao Guo

Yang Feng

Min Zhang

278

11 Jun 2024

Autoregressive Diffusion Transformer for Text-to-Speech Synthesis

Zhijun Liu

Haizhou Li

203

08 Jun 2024

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

Shaolei Zhang

Yang Feng

293

05 Jun 2024

Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation

275

04 Jun 2024

SimulTron: On-Device Simultaneous Speech to Speech Translation

Ye Jia

Michelle Tadmor Ramanovich

207

04 Jun 2024

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

...

373

28 May 2024

DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation

Weiting Tan

Jingyu Zhang

Lingfeng Shen

Daniel Khashabi

Philipp Koehn

286

22 May 2024

Direct Punjabi to English speech translation using discrete units

Prabhjot Kaur

L. A. M. Bush

Weisong Shi

253

25 Feb 2024

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

401

25 Feb 2024

Towards audio language modeling -- an overview

Haibin Wu

Xuanjun Chen

Yi-Cheng Lin

Hung-yi Lee

318

20 Feb 2024

TranSentence: Speech-to-speech Translation via Language-agnostic Sentence-level Speech Encoding without Language-parallel DataIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Seung-Bin Kim

Sang-Hoon Lee

Seong-Whan Lee

213

17 Jan 2024

GSQA: An End-to-End Model for Generative Spoken Question AnsweringInterspeech (Interspeech), 2023

Guan-Ting Lin

Hung-yi Lee

293

15 Dec 2023

Efficient Monotonic Multihead Attention

203

07 Dec 2023

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech RepresentationComputer Vision and Pattern Recognition (CVPR), 2023

447

05 Dec 2023

DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech TranslationNeural Information Processing Systems (NeurIPS), 2023

Qingkai Fang

Yan Zhou

Yangzhou Feng

254

11 Oct 2023

Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit PredictionInternational Conference on Learning Representations (ICLR), 2023

Jiatong Shi

311

04 Oct 2023

Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter SharingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

B. Grimstad

Xuankai Chang

Antonios Anastasopoulos

Yuya Fujita

Shinji Watanabe

358

27 Sep 2023

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal TokensIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Jeong Hun Yeo

235

15 Sep 2023

Sparks of Large Audio Models: A Survey and Outlook

...

Björn W. Schuller

819

24 Aug 2023

Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit TranslationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

265

03 Aug 2023

Multilingual Speech-to-Speech Translation into Multiple Target Languages

236

17 Jul 2023

Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline ModelsInterspeech (Interspeech), 2023

310

01 Jun 2023

Intelligible Lip-to-Speech Synthesis with Speech UnitsInterspeech (Interspeech), 2023

J. Choi

Minsu Kim

Y. Ro

316

31 May 2023

Translatotron 3: Speech to Speech Translation with Monolingual DataIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Eliya Nachmani

Alon Levkovitch

Yi-Yang Ding

Chulayutsh Asawaroengchai

Heiga Zen

Michelle Tadmor Ramanovich

382

27 May 2023

Duplex Diffusion Models Improve Speech-to-Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Xianchao Wu

DiffM

263

22 May 2023

DUB: Discrete Unit Back-translation for Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

268

19 May 2023

Back Translation for Speech-to-text Translation Without TranscriptsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Qingkai Fang

Yang Feng

288

15 May 2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation ToolkitAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Jiatong Shi

...

278

10 Apr 2023