ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.05604
  4. Cited By
Direct speech-to-speech translation with discrete units

Direct speech-to-speech translation with discrete units

12 July 2021
Ann Lee
Peng-Jen Chen
Changhan Wang
Jiatao Gu
Sravya Popuri
Xutai Ma
Adam Polyak
Yossi Adi
Qing He
Yun Tang
J. Pino
Wei-Ning Hsu
ArXivPDFHTML

Papers citing "Direct speech-to-speech translation with discrete units"

42 / 42 papers shown
Title
Spatial Speech Translation: Translating Across Space With Binaural Hearables
Spatial Speech Translation: Translating Across Space With Binaural Hearables
Tuochao Chen
Qirui Wang
Runlin He
Shyam Gollakota
31
0
0
25 Apr 2025
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
Xue Jiang
Xiulian Peng
Yuan Zhang
Yan-Heng Lu
SSL
83
0
0
15 Mar 2025
Speech to Speech Translation with Translatotron: A State of the Art Review
Speech to Speech Translation with Translatotron: A State of the Art Review
Jules R. Kala
Emmanuel Adetiba
Abdultaofeek Abayom
Oluwatobi E. Dare
Ayodele H. Ifijeh
157
0
0
21 Feb 2025
High-Fidelity Simultaneous Speech-To-Speech Translation
High-Fidelity Simultaneous Speech-To-Speech Translation
Tom Labiausse
Laurent Mazaré
Edouard Grave
P. Pérez
Alexandre Défossez
Neil Zeghidour
168
0
0
05 Feb 2025
A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation
A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation
Anna Min
Chenxu Hu
Yi Ren
Hang Zhao
61
0
0
01 Feb 2025
Discrete Speech Unit Extraction via Independent Component Analysis
Discrete Speech Unit Extraction via Independent Component Analysis
Tomohiko Nakamura
Kwanghee Choi
Keigo Hojo
Yoshiaki Bando
Satoru Fukayama
Shinji Watanabe
43
0
0
11 Jan 2025
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Cheol Jun Cho
Nicholas Lee
Akshat Gupta
Dhruv Agarwal
Ethan Chen
Alan W Black
Gopala K. Anumanchipalli
32
0
0
09 Oct 2024
Cognitively Inspired Energy-Based World Models
Cognitively Inspired Energy-Based World Models
Alexi Gladstone
Ganesh Nanduru
Md. Mofijul Islam
Aman Chadha
Jundong Li
Tariq Iqbal
36
0
0
13 Jun 2024
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Xuankai Chang
Jiatong Shi
Jinchuan Tian
Yuning Wu
Yuxun Tang
Yihan Wu
Shinji Watanabe
Yossi Adi
Xie Chen
Qin Jin
45
15
0
11 Jun 2024
CTC-based Non-autoregressive Textless Speech-to-Speech Translation
CTC-based Non-autoregressive Textless Speech-to-Speech Translation
Qingkai Fang
Zhengrui Ma
Yan Zhou
Min Zhang
Yang Feng
52
0
0
11 Jun 2024
Can We Achieve High-quality Direct Speech-to-Speech Translation without
  Parallel Speech Data?
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?
Qingkai Fang
Shaolei Zhang
Zhengrui Ma
Min Zhang
Yang Feng
VLM
40
1
0
11 Jun 2024
A Non-autoregressive Generation Framework for End-to-End Simultaneous
  Speech-to-Any Translation
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation
Zhengrui Ma
Qingkai Fang
Shaolei Zhang
Shoutao Guo
Yang Feng
Min Zhang
53
9
0
11 Jun 2024
Textless Acoustic Model with Self-Supervised Distillation for
  Noise-Robust Expressive Speech-to-Speech Translation
Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation
Min-Jae Hwang
Ilia Kulikov
Benjamin Peloquin
Hongyu Gong
Peng-Jen Chen
Ann Lee
27
1
0
04 Jun 2024
Direct Punjabi to English speech translation using discrete units
Direct Punjabi to English speech translation using discrete units
Prabhjot Kaur
L. A. M. Bush
Weisong Shi
26
0
0
25 Feb 2024
GSQA: An End-to-End Model for Generative Spoken Question Answering
GSQA: An End-to-End Model for Generative Spoken Question Answering
Min-Han Shih
Ho-Lam Chung
Yu-Chi Pai
Ming-Hao Hsu
Guan-Ting Lin
Shang-Wen Li
Hung-yi Lee
ELM
AuLLM
33
2
0
15 Dec 2023
Toward Joint Language Modeling for Speech Units and Text
Toward Joint Language Modeling for Speech Units and Text
Ju-Chieh Chou
Chung-Ming Chien
Wei-Ning Hsu
Karen Livescu
Arun Babu
Alexis Conneau
Alexei Baevski
Michael Auli
VLM
26
20
0
12 Oct 2023
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Chung-Ming Chien
Mingjiamei Zhang
Ju-Chieh Chou
Karen Livescu
26
3
0
09 Oct 2023
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
Rongjie Huang
Huadai Liu
Xize Cheng
Yi Ren
Lin Li
...
Jinzheng He
Lichao Zhang
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
70
8
0
24 May 2023
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net
  Encoder With Multiple STFTs
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs
Won Jang
D. Lim
Heayoung Park
19
1
0
18 May 2023
Back Translation for Speech-to-text Translation Without Transcripts
Back Translation for Speech-to-text Translation Without Transcripts
Qingkai Fang
Yang Feng
30
13
0
15 May 2023
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Brian Yan
Jiatong Shi
Yun Tang
H. Inaguma
Yifan Peng
...
Zhaoheng Ni
Moto Hira
Soumi Maiti
J. Pino
Shinji Watanabe
19
20
0
10 Apr 2023
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for
  Universal and Generalized Speech Enhancement
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement
Wei-Ning Hsu
Tal Remez
Bowen Shi
Jacob Donley
Yossi Adi
DiffM
27
11
0
21 Dec 2022
Speaking Style Conversion in the Waveform Domain Using Discrete
  Self-Supervised Units
Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units
Gallil Maimon
Yossi Adi
21
13
0
19 Dec 2022
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual
  Speech-to-Speech Translations
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations
Paul-Ambroise Duquenne
Hongyu Gong
Ning Dong
Jingfei Du
Ann Lee
Vedanuj Goswani
Changhan Wang
J. Pino
Benoît Sagot
Holger Schwenk
37
34
0
08 Nov 2022
Audio Language Modeling using Perceptually-Guided Discrete
  Representations
Audio Language Modeling using Perceptually-Guided Discrete Representations
Felix Kreuk
Yaniv Taigman
Adam Polyak
Jade Copet
Gabriel Synnaeve
Alexandre Défossez
Yossi Adi
27
4
0
02 Nov 2022
Textless Direct Speech-to-Speech Translation with Discrete Speech
  Representation
Textless Direct Speech-to-Speech Translation with Discrete Speech Representation
Xinjian Li
Ye Jia
Chung-Cheng Chiu
23
23
0
31 Oct 2022
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to
  Speech Translation
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation
Kun Wei
Long Zhou
Zi-Hua Zhang
Liping Chen
Shujie Liu
Lei He
Jinyu Li
Furu Wei
14
13
0
31 Oct 2022
Self-supervised language learning from raw audio: Lessons from the Zero
  Resource Speech Challenge
Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge
Ewan Dunbar
Nicolas Hamilakis
Emmanuel Dupoux
SSL
32
30
0
27 Oct 2022
Bootstrapping meaning through listening: Unsupervised learning of spoken
  sentence embeddings
Bootstrapping meaning through listening: Unsupervised learning of spoken sentence embeddings
Jian Zhu
Zuoyu Tian
Yadong Liu
Cong Zhang
Chia-wen Lo
SSL
30
2
0
23 Oct 2022
A Textless Metric for Speech-to-Speech Comparison
A Textless Metric for Speech-to-Speech Comparison
Laurent Besacier
S. Ribeiro
Olivier Galibert
Ioan Calapodescu
33
5
0
21 Oct 2022
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder
  Based Speech-Text Pre-training
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Zi-Hua Zhang
Long Zhou
Junyi Ao
Shujie Liu
Lirong Dai
Jinyu Li
Furu Wei
61
57
0
07 Oct 2022
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on
  Fixed-Point Iteration
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration
Yuma Koizumi
Kohei Yatabe
Heiga Zen
M. Bacchiani
DiffM
42
29
0
03 Oct 2022
AudioGen: Textually Guided Audio Generation
AudioGen: Textually Guided Audio Generation
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
DiffM
27
289
0
30 Sep 2022
Enhanced Direct Speech-to-Speech Translation Using Self-supervised
  Pre-training and Data Augmentation
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation
Sravya Popuri
Peng-Jen Chen
Changhan Wang
J. Pino
Yossi Adi
Jiatao Gu
Wei-Ning Hsu
Ann Lee
20
56
0
06 Apr 2022
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired
  Speech Data
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Junyi Ao
Zi-Hua Zhang
Long Zhou
Shujie Liu
Haizhou Li
Tom Ko
Lirong Dai
Jinyu Li
Yao Qian
Furu Wei
SSL
19
19
0
31 Mar 2022
Leveraging unsupervised and weakly-supervised data to improve direct
  speech-to-speech translation
Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation
Ye Jia
Yifan Ding
Ankur Bapna
Colin Cherry
Yu Zhang
Alexis Conneau
Nobuyuki Morioka
39
20
0
24 Mar 2022
Measuring the Impact of Individual Domain Factors in Self-Supervised
  Pre-Training
Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training
Ramon Sanabria
Wei-Ning Hsu
Alexei Baevski
Michael Auli
19
7
0
01 Mar 2022
Textless Speech-to-Speech Translation on Real Data
Textless Speech-to-Speech Translation on Real Data
Ann Lee
Hongyu Gong
Paul-Ambroise Duquenne
Holger Schwenk
Peng-Jen Chen
...
Sravya Popuri
Yossi Adi
J. Pino
Jiatao Gu
Wei-Ning Hsu
26
142
0
15 Dec 2021
From Start to Finish: Latency Reduction Strategies for Incremental
  Speech Synthesis in Simultaneous Speech-to-Speech Translation
From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation
Danni Liu
Changhan Wang
Hongyu Gong
Xutai Ma
Yun Tang
J. Pino
17
4
0
15 Oct 2021
Translatotron 2: High-quality direct speech-to-speech translation with
  voice preservation
Translatotron 2: High-quality direct speech-to-speech translation with voice preservation
Ye Jia
Michelle Tadmor Ramanovich
Tal Remez
Roi Pomerantz
26
67
0
19 Jul 2021
Generative Spoken Language Modeling from Raw Audio
Generative Spoken Language Modeling from Raw Audio
Kushal Lakhotia
Evgeny Kharitonov
Wei-Ning Hsu
Yossi Adi
Adam Polyak
...
Tu Nguyen
Jade Copet
Alexei Baevski
A. Mohamed
Emmanuel Dupoux
AuLLM
185
337
0
01 Feb 2021
Exploring wav2vec 2.0 on speaker verification and language
  identification
Exploring wav2vec 2.0 on speaker verification and language identification
Zhiyun Fan
Meng Li
Shiyu Zhou
Bo Xu
103
202
0
11 Dec 2020
1