Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2212.08055
Cited By
v1
v2 (latest)
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
15 December 2022
Hirofumi Inaguma
Sravya Popuri
Ilia Kulikov
Peng-Jen Chen
Changhan Wang
Yu-An Chung
Yun Tang
Ann Lee
Shinji Watanabe
J. Pino
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units"
50 / 53 papers shown
RosettaSpeech: Zero-Shot Speech-to-Speech Translation without Parallel Speech
Zhisheng Zheng
Xiaohang Sun
Tuan Dinh
Abhishek Yanamandra
Abhinav Jain
...
Sunil Hadap
Vimal Bhat
Manoj Aggarwal
Gérard Medioni
David Harwath
151
0
0
26 Nov 2025
Improving Direct Persian-English Speech-to-Speech Translation with Discrete Units and Synthetic Parallel Data
Physical Review X (PRX), 2025
Sina Rashidi
Hossein Sameti
114
0
0
16 Nov 2025
MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-token Prediction
Jianjin Wang
Runsong Zhao
Xiaoqian Liu
Yuan Ge
Ziqiang Xu
Tong Xiao
Shengxiang Gao
Z. Yu
Jingbo Zhu
144
0
0
11 Oct 2025
UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice
Sitong Cheng
Weizhen Bian
Xinsheng Wang
Ruibin Yuan
Jianyi Chen
Shunshun Yin
Wenhan Luo
Wei Xue
193
0
0
25 Sep 2025
Speech Vecalign: an Embedding-based Method for Aligning Parallel Speech Documents
Chutong Meng
Philipp Koehn
137
0
0
22 Sep 2025
PRIM: Towards Practical In-Image Multilingual Machine Translation
Yanzhi Tian
Zeming Liu
Zhengyang Liu
Chong Feng
Xin Li
Heyan Huang
Yuhang Guo
VLM
182
2
0
05 Sep 2025
End-to-End Speech Translation for Low-Resource Languages Using Weakly Labeled Data
Aishwarya Pothula
Bhavana Akkiraju
Srihari Bandarupalli
Charan D
Santosh Kesiraju
Anil Kumar Vuppala
188
1
0
19 Jun 2025
Advances in LLMs with Focus on Reasoning, Adaptability, Efficiency and Ethics
Asifullah Khan
Muhammad Zaeem Khan
Saleha Jamshed
Sadia Ahmad
Aleesha Zainab
Kaynat Khatib
Faria Bibi
Abdul Rehman
OffRL
LRM
377
4
0
14 Jun 2025
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
Hayato Futami
E. Tsunoo
Yosuke Kashiwagi
Yuki Ito
Hassan Shahmohammadi
Siddhant Arora
Shinji Watanabe
AuLLM
287
1
0
12 Jun 2025
Exploring In-Image Machine Translation with Real-World Background
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yanzhi Tian
Zeming Liu
Zhengyang Liu
Yuhang Guo
DiffM
VLM
223
4
0
21 May 2025
Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yuhao Zhang
Xiangnan Ma
Kaiqi Kou
Peizhuo Liu
Weiqiao Shan
Benyou Wang
Tong Xiao
Yuxin Huang
Zhengtao Yu
Jingbo Zhu
VLM
249
1
0
21 May 2025
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Keqi Deng
Wenxi Chen
Xie Chen
P. Woodland
372
3
0
22 Apr 2025
Scaling Analysis of Interleaved Speech-Text Language Models
Gallil Maimon
Michael Hassid
Amit Roth
Yossi Adi
AuLLM
489
7
0
03 Apr 2025
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
IEEE Journal on Selected Topics in Signal Processing (JSTSP), 2024
Xue Jiang
Xiulian Peng
Yuan Zhang
Yan Lu
SSL
415
6
0
15 Mar 2025
Speech to Speech Translation with Translatotron: A State of the Art Review
Jules R. Kala
Emmanuel Adetiba
Abdultaofeek Abayom
Oluwatobi E. Dare
Ayodele H. Ifijeh
589
0
0
21 Feb 2025
High-Fidelity Simultaneous Speech-To-Speech Translation
Tom Labiausse
Laurent Mazaré
Edouard Grave
P. Pérez
Alexandre Défossez
Neil Zeghidour
1.1K
19
0
05 Feb 2025
Discrete Speech Unit Extraction via Independent Component Analysis
Tomohiko Nakamura
Kwanghee Choi
Keigo Hojo
Yoshiaki Bando
Satoru Fukayama
Shinji Watanabe
265
4
0
11 Jan 2025
Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?
Conference on Machine Translation (WMT), 2024
Ioannis Tsiamas
Matthias Sperber
Andrew Finch
Sarthak Garg
195
8
0
31 Oct 2024
Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR
Abhishek Gupta
Amruta Parulekar
Sameep Chattopadhyay
Preethi Jyothi
VLM
201
0
0
17 Oct 2024
Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation
Interspeech (Interspeech), 2024
Nameer Hirschkind
Xiao Yu
Xiao Yu
Joseph Liu
Eloi DuBois
...
Colin Sinclair
Kyle Spence
Charles Shang
Zoë Abrams
Morgan McGuire
189
1
0
14 Jun 2024
CTC-based Non-autoregressive Textless Speech-to-Speech Translation
Qingkai Fang
Zhengrui Ma
Yan Zhou
Min Zhang
Yang Feng
289
4
0
11 Jun 2024
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?
Qingkai Fang
Shaolei Zhang
Zhengrui Ma
Min Zhang
Yang Feng
VLM
242
11
0
11 Jun 2024
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation
Zhengrui Ma
Qingkai Fang
Shaolei Zhang
Shoutao Guo
Yang Feng
Min Zhang
278
20
0
11 Jun 2024
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Zhijun Liu
Shuai Wang
Sho Inoue
Qibing Bai
Haizhou Li
DiffM
203
36
0
08 Jun 2024
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
Shaolei Zhang
Qingkai Fang
Shoutao Guo
Zhengrui Ma
Min Zhang
Yang Feng
293
23
0
05 Jun 2024
Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation
Min-Jae Hwang
Ilia Kulikov
Benjamin Peloquin
Hongyu Gong
Peng-Jen Chen
Ann Lee
275
4
0
04 Jun 2024
SimulTron: On-Device Simultaneous Speech to Speech Translation
A. Agranovich
Eliya Nachmani
Oleg Rybakov
Yifan Ding
Ye Jia
Nadav Bar
Heiga Zen
Michelle Tadmor Ramanovich
207
0
0
04 Jun 2024
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
Chenyang Le
Yao Qian
Dongmei Wang
Long Zhou
Shujie Liu
...
Midia Yousefi
Yanmin Qian
Jinyu Li
Sheng Zhao
Michael Zeng
373
15
0
28 May 2024
DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation
Weiting Tan
Jingyu Zhang
Lingfeng Shen
Daniel Khashabi
Philipp Koehn
286
1
0
22 May 2024
Direct Punjabi to English speech translation using discrete units
Prabhjot Kaur
L. A. M. Bush
Weisong Shi
253
2
0
25 Feb 2024
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Minsu Kim
Jee-weon Jung
Hyeongseop Rha
Soumi Maiti
Siddhant Arora
Xuankai Chang
Shinji Watanabe
Y. Ro
401
8
0
25 Feb 2024
Towards audio language modeling -- an overview
Haibin Wu
Xuanjun Chen
Yi-Cheng Lin
Kai-Wei Chang
Ho-Lam Chung
Alexander H. Liu
Hung-yi Lee
AuLLM
318
65
0
20 Feb 2024
TranSentence: Speech-to-speech Translation via Language-agnostic Sentence-level Speech Encoding without Language-parallel Data
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Seung-Bin Kim
Sang-Hoon Lee
Seong-Whan Lee
213
6
0
17 Jan 2024
GSQA: An End-to-End Model for Generative Spoken Question Answering
Interspeech (Interspeech), 2023
Min-Han Shih
Ho-Lam Chung
Yu-Chi Pai
Ming-Hao Hsu
Guan-Ting Lin
Shang-Wen Li
Hung-yi Lee
ELM
AuLLM
293
11
0
15 Dec 2023
Efficient Monotonic Multihead Attention
Xutai Ma
Anna Y. Sun
Siqi Ouyang
Hirofumi Inaguma
Paden Tomasello
203
7
0
07 Dec 2023
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
Computer Vision and Pattern Recognition (CVPR), 2023
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
447
16
0
05 Dec 2023
DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation
Neural Information Processing Systems (NeurIPS), 2023
Qingkai Fang
Yan Zhou
Yangzhou Feng
254
17
0
11 Oct 2023
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
International Conference on Learning Representations (ICLR), 2023
Jiatong Shi
Hirofumi Inaguma
Xutai Ma
Ilia Kulikov
Anna Y. Sun
311
38
0
04 Oct 2023
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
B. Grimstad
Xuankai Chang
Antonios Anastasopoulos
Yuya Fujita
Shinji Watanabe
358
5
0
27 Sep 2023
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Minsu Kim
J. Choi
Soumi Maiti
Jeong Hun Yeo
Shinji Watanabe
Y. Ro
VLM
235
9
0
15 Sep 2023
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Xiaoshi Zhong
Björn W. Schuller
LM&MA
AuLLM
819
56
0
24 Aug 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Minsu Kim
J. Choi
Dahun Kim
Y. Ro
265
10
0
03 Aug 2023
Multilingual Speech-to-Speech Translation into Multiple Target Languages
Hongyu Gong
Ning Dong
Sravya Popuri
Vedanuj Goswami
Ann Lee
J. Pino
236
5
0
17 Jul 2023
Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models
Interspeech (Interspeech), 2023
Liam Dugan
Anshul Wadhawan
Kyle Spence
Chris Callison-Burch
Morgan McGuire
Victor Zordan
OffRL
310
2
0
01 Jun 2023
Intelligible Lip-to-Speech Synthesis with Speech Units
Interspeech (Interspeech), 2023
J. Choi
Minsu Kim
Y. Ro
316
39
0
31 May 2023
Translatotron 3: Speech to Speech Translation with Monolingual Data
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Eliya Nachmani
Alon Levkovitch
Yi-Yang Ding
Chulayutsh Asawaroengchai
Heiga Zen
Michelle Tadmor Ramanovich
382
25
0
27 May 2023
Duplex Diffusion Models Improve Speech-to-Speech Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Xianchao Wu
DiffM
263
6
0
22 May 2023
DUB: Discrete Unit Back-translation for Speech Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Dong Zhang
Rong Ye
Tom Ko
Mingxuan Wang
Yaqian Zhou
268
34
0
19 May 2023
Back Translation for Speech-to-text Translation Without Transcripts
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Qingkai Fang
Yang Feng
288
17
0
15 May 2023
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Brian Yan
Jiatong Shi
Yun Tang
Hirofumi Inaguma
Yifan Peng
...
Zhaoheng Ni
Moto Hira
Soumi Maiti
J. Pino
Shinji Watanabe
278
23
0
10 Apr 2023
1
2
Next
Page 1 of 2