ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.08352
  4. Cited By
Textless Speech-to-Speech Translation on Real Data

Textless Speech-to-Speech Translation on Real Data

15 December 2021
Ann Lee
Hongyu Gong
Paul-Ambroise Duquenne
Holger Schwenk
Peng-Jen Chen
Changhan Wang
Sravya Popuri
Yossi Adi
J. Pino
Jiatao Gu
Wei-Ning Hsu
ArXivPDFHTML

Papers citing "Textless Speech-to-Speech Translation on Real Data"

50 / 110 papers shown
Title
On The Landscape of Spoken Language Models: A Comprehensive Survey
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
49
2
0
11 Apr 2025
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
Xue Jiang
Xiulian Peng
Yuan Zhang
Yan-Heng Lu
SSL
83
0
0
15 Mar 2025
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
Sungwoo Cho
J. Choi
Sungnyun Kim
Se-Young Yun
63
0
0
14 Mar 2025
A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation
A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation
Anna Min
Chenxu Hu
Yi Ren
Hang Zhao
61
0
0
01 Feb 2025
Discrete Speech Unit Extraction via Independent Component Analysis
Discrete Speech Unit Extraction via Independent Component Analysis
Tomohiko Nakamura
Kwanghee Choi
Keigo Hojo
Yoshiaki Bando
Satoru Fukayama
Shinji Watanabe
43
0
0
11 Jan 2025
CA-SSLR: Condition-Aware Self-Supervised Learning Representation for
  Generalized Speech Processing
CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing
Yen-Ju Lu
Jing Liu
Thomas Thebaud
Laureano Moro Velázquez
Ariya Rastrow
Najim Dehak
Jesus Villalba
74
1
0
05 Dec 2024
How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario
How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario
Shih-Heng Wang
Zih-Ching Chen
Jiatong Shi
Ming To Chuang
Guan-Ting Lin
Kuan Po Huang
David F. Harwath
Shang-Wen Li
Hung-yi Lee
76
1
0
27 Nov 2024
AfriHuBERT: A self-supervised speech representation model for African
  languages
AfriHuBERT: A self-supervised speech representation model for African languages
Jesujoba Oluwadara Alabi
Xuechen Liu
Dietrich Klakow
Junichi Yamagishi
VLM
33
0
0
30 Sep 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
AuLLM
MLLM
VLM
69
21
0
26 Sep 2024
Speech Recognition Rescoring with Large Speech-Text Foundation Models
Speech Recognition Rescoring with Large Speech-Text Foundation Models
Prashanth Gurunath Shivakumar
J. Kolehmainen
Aditya Gourav
Yi Gu
Ankur Gandhe
Ariya Rastrow
I. Bulyko
AuLLM
26
0
0
25 Sep 2024
Improving Spoken Language Modeling with Phoneme Classification: A Simple
  Fine-tuning Approach
Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach
Maxime Poli
Emmanuel Chemla
Emmanuel Dupoux
34
2
0
16 Sep 2024
Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT
Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT
Ryota Komatsu
Takahiro Shinozaki
SSL
34
1
0
16 Sep 2024
Exploring the Impact of Data Quantity on ASR in Extremely Low-resource
  Languages
Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages
Yao-Fei Cheng
Li-Wei Chen
Hung-Shin Lee
Hsin-Min Wang
21
0
0
13 Sep 2024
Improved Visually Prompted Keyword Localisation in Real Low-Resource
  Settings
Improved Visually Prompted Keyword Localisation in Real Low-Resource Settings
Leanne Nortje
Dan Oneaţă
Herman Kamper
VLM
30
0
0
09 Sep 2024
LAST: Language Model Aware Speech Tokenization
LAST: Language Model Aware Speech Tokenization
A. Turetzky
Yossi Adi
24
2
0
05 Sep 2024
LLaST: Improved End-to-end Speech Translation System Leveraged by Large
  Language Models
LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models
Xi Chen
Songyang Zhang
Qibing Bai
Kai-xiang Chen
Satoshi Nakamura
AuLLM
35
6
0
22 Jul 2024
Seamless Language Expansion: Enhancing Multilingual Mastery in
  Self-Supervised Models
Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models
Jing Xu
Minglin Wu
Xixin Wu
Helen Meng
CLL
32
1
0
20 Jun 2024
NAST: Noise Aware Speech Tokenization for Speech Language Models
NAST: Noise Aware Speech Tokenization for Speech Language Models
Shoval Messica
Yossi Adi
22
6
0
16 Jun 2024
ToneUnit: A Speech Discretization Approach for Tonal Language Speech
  Synthesis
ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
Dehua Tao
Daxin Tan
Y. Yeung
Xiao Chen
Tan Lee
30
3
0
13 Jun 2024
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Se Jin Park
Chae Won Kim
Hyeongseop Rha
Minsu Kim
Joanna Hong
Jeong Hun Yeo
Yong Man Ro
CVBM
AuLLM
40
6
0
12 Jun 2024
CTC-based Non-autoregressive Textless Speech-to-Speech Translation
CTC-based Non-autoregressive Textless Speech-to-Speech Translation
Qingkai Fang
Zhengrui Ma
Yan Zhou
Min Zhang
Yang Feng
52
0
0
11 Jun 2024
Can We Achieve High-quality Direct Speech-to-Speech Translation without
  Parallel Speech Data?
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?
Qingkai Fang
Shaolei Zhang
Zhengrui Ma
Min Zhang
Yang Feng
VLM
35
1
0
11 Jun 2024
mHuBERT-147: A Compact Multilingual HuBERT Model
mHuBERT-147: A Compact Multilingual HuBERT Model
Marcely Zanon Boito
Vivek Iyer
Nikolaos Lagos
Laurent Besacier
Ioan Calapodescu
VLM
62
8
0
10 Jun 2024
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Avihu Dekel
Raul Fernandez
36
2
0
08 Jun 2024
Textless Acoustic Model with Self-Supervised Distillation for
  Noise-Robust Expressive Speech-to-Speech Translation
Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation
Min-Jae Hwang
Ilia Kulikov
Benjamin Peloquin
Hongyu Gong
Peng-Jen Chen
Ann Lee
27
1
0
04 Jun 2024
SeamlessExpressiveLM: Speech Language Model for Expressive
  Speech-to-Speech Translation with Chain-of-Thought
SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought
Hongyu Gong
Bandhav Veluri
44
0
0
30 May 2024
CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer
  Learning
CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning
Medha Hira
Arnav Goel
Anubha Gupta
20
1
0
23 May 2024
SBAAM! Eliminating Transcript Dependency in Automatic Subtitling
SBAAM! Eliminating Transcript Dependency in Automatic Subtitling
Marco Gaido
Sara Papi
Matteo Negri
Mauro Cettolo
L. Bentivogli
35
1
0
17 May 2024
MSLM-S2ST: A Multitask Speech Language Model for Textless
  Speech-to-Speech Translation with Speaker Style Preservation
MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation
Yifan Peng
Ilia Kulikov
Yilin Yang
Sravya Popuri
Hui Lu
Changhan Wang
Hongyu Gong
28
4
0
19 Mar 2024
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing
  Different Modalities as Different Languages
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Minsu Kim
Jee-weon Jung
Hyeongseop Rha
Soumi Maiti
Siddhant Arora
Xuankai Chang
Shinji Watanabe
Y. Ro
28
6
0
25 Feb 2024
Direct Punjabi to English speech translation using discrete units
Direct Punjabi to English speech translation using discrete units
Prabhjot Kaur
L. A. M. Bush
Weisong Shi
26
0
0
25 Feb 2024
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and
  Context-Aware Visual Speech Processing
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing
Jeong Hun Yeo
Seunghee Han
Minsu Kim
Y. Ro
48
11
0
23 Feb 2024
SpiRit-LM: Interleaved Spoken and Written Language Model
SpiRit-LM: Interleaved Spoken and Written Language Model
Tu Nguyen
Benjamin Muller
Bokai Yu
Marta R. Costa-jussá
Maha Elbayad
...
Itai Gat
Gabriel Synnaeve
Juan Pino
Benoît Sagot
Emmanuel Dupoux
AuLLM
VLM
51
32
0
08 Feb 2024
UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit
  Normalization
UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization
Yuejiao Wang
Xixin Wu
Disong Wang
Lingwei Meng
Helen M. Meng
35
5
0
26 Jan 2024
Efficient Training for Multilingual Visual Speech Recognition:
  Pre-training with Discretized Visual Speech Representation
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation
Minsu Kim
Jeong Hun Yeo
Se Jin Park
J. Choi
Y. Ro
25
5
0
18 Jan 2024
TranSentence: Speech-to-speech Translation via Language-agnostic
  Sentence-level Speech Encoding without Language-parallel Data
TranSentence: Speech-to-speech Translation via Language-agnostic Sentence-level Speech Encoding without Language-parallel Data
Seung-Bin Kim
Sang-Hoon Lee
Seong-Whan Lee
22
4
0
17 Jan 2024
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head
  Translation
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation
Xize Cheng
Rongjie Huang
Linjun Li
Tao Jin
Zehan Wang
Aoxiong Yin
Minglei Li
Xinyu Duan
Changpeng Yang
Zhou Zhao
28
2
0
23 Dec 2023
EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in
  Speech-to-Speech Models
EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models
Maureen de Seyssel
Antony DÁvirro
Adina Williams
Emmanuel Dupoux
30
3
0
21 Dec 2023
GSQA: An End-to-End Model for Generative Spoken Question Answering
GSQA: An End-to-End Model for Generative Spoken Question Answering
Min-Han Shih
Ho-Lam Chung
Yu-Chi Pai
Ming-Hao Hsu
Guan-Ting Lin
Shang-Wen Li
Hung-yi Lee
ELM
AuLLM
28
2
0
15 Dec 2023
Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus
Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus
Yi-Hui Chou
Kalvin Chang
Meng-Ju Wu
Winston Ou
Alice Wen-Hsin Bi
...
Iu-Tshian Phoann
Winnie Chang
Chenxuan Cui
Noel Chen
Jiatong Shi
37
3
0
06 Dec 2023
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation
  with Unified Audio-Visual Speech Representation
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
25
12
0
05 Dec 2023
DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct
  Speech-to-Speech Translation
DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation
Yongxin Zhu
Zhujin Gao
Xinyuan Zhou
Zhongyi Ye
Linli Xu
26
2
0
26 Oct 2023
Toward Joint Language Modeling for Speech Units and Text
Toward Joint Language Modeling for Speech Units and Text
Ju-Chieh Chou
Chung-Ming Chien
Wei-Ning Hsu
Karen Livescu
Arun Babu
Alexis Conneau
Alexei Baevski
Michael Auli
VLM
26
19
0
12 Oct 2023
DASpeech: Directed Acyclic Transformer for Fast and High-quality
  Speech-to-Speech Translation
DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation
Qingkai Fang
Yan Zhou
Yangzhou Feng
32
6
0
11 Oct 2023
Enhancing expressivity transfer in textless speech-to-speech translation
Enhancing expressivity transfer in textless speech-to-speech translation
J. Duret
Benjamin O’Brien
Yannick Esteve
Titouan Parcollet
43
2
0
11 Oct 2023
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Chung-Ming Chien
Mingjiamei Zhang
Ju-Chieh Chou
Karen Livescu
26
3
0
09 Oct 2023
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Jiatong Shi
William Chen
Dan Berrebbi
Hsiu-Hsuan Wang
Wei-Ping Huang
...
Yuxun Tang
Shang-Wen Li
Abdelrahman Mohamed
Hung-yi Lee
Shinji Watanabe
LRM
ELM
34
15
0
09 Oct 2023
Evaluating Self-Supervised Speech Representations for Indigenous
  American Languages
Evaluating Self-Supervised Speech Representations for Indigenous American Languages
Chih-Chen Chen
William Chen
Rodolfo Zevallos
John E. Ortega
34
7
0
05 Oct 2023
Zero Resource Code-switched Speech Benchmark Using Speech Utterance
  Pairs For Multiple Spoken Languages
Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages
Kuan-Po Huang
Chih-Kai Yang
Yu-Kuan Fu
Ewan Dunbar
Hung-yi Lee
29
5
0
04 Oct 2023
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised
  Learning with Masked Unit Prediction
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
Jiatong Shi
H. Inaguma
Xutai Ma
Ilia Kulikov
Anna Y. Sun
45
24
0
04 Oct 2023
123
Next