ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.13009
  4. Cited By
Textually Pretrained Speech Language Models

Textually Pretrained Speech Language Models

22 May 2023
Michael Hassid
Tal Remez
Tu Nguyen
Itai Gat
Alexis Conneau
Felix Kreuk
Jade Copet
Alexandre Défossez
Gabriel Synnaeve
Emmanuel Dupoux
Roy Schwartz
Yossi Adi
    VLM
    SyDa
ArXivPDFHTML

Papers citing "Textually Pretrained Speech Language Models"

50 / 58 papers shown
Title
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Yemin Shi
Yu Shu
Siwei Dong
Guangyi Liu
Jaward Sesay
Jingwen Li
Zhiting Hu
AuLLM
VLM
45
0
0
05 May 2025
On The Landscape of Spoken Language Models: A Comprehensive Survey
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
47
2
0
11 Apr 2025
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
Liang-Hsuan Tseng
Yi-Chang Chen
Kuan-Yi Lee
Da-shan Shiu
Hung-yi Lee
AuLLM
52
0
0
09 Apr 2025
Scaling Analysis of Interleaved Speech-Text Language Models
Scaling Analysis of Interleaved Speech-Text Language Models
Gallil Maimon
Michael Hassid
Amit Roth
Yossi Adi
AuLLM
43
0
0
03 Apr 2025
Text-Speech Language Models with Improved Cross-Modal Transfer by Aligning Abstraction Levels
Santiago Cuervo
Adel Moumen
Yanis Labrak
Sameer Khurana
Antoine Laurent
Mickael Rouvier
R. Marxer
72
1
0
08 Mar 2025
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
Tianpeng Li
J. Liu
Tao Zhang
Yuanbo Fang
Da Pan
...
Guosheng Dong
Jianhua Xu
Haoze Sun
Zenan Zhou
Weipeng Chen
AuLLM
53
3
0
24 Feb 2025
Discrete Speech Unit Extraction via Independent Component Analysis
Discrete Speech Unit Extraction via Independent Component Analysis
Tomohiko Nakamura
Kwanghee Choi
Keigo Hojo
Yoshiaki Bando
Satoru Fukayama
Shinji Watanabe
43
0
0
11 Jan 2025
Unsupervised Speech Segmentation: A General Approach Using Speech Language Models
Unsupervised Speech Segmentation: A General Approach Using Speech Language Models
Avishai Elmakies
Omri Abend
Yossi Adi
28
1
0
08 Jan 2025
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
Heng-Jui Chang
Hongyu Gong
Changhan Wang
James R. Glass
Yu-An Chung
26
0
0
31 Oct 2024
Enhancing TTS Stability in Hebrew using Discrete Semantic Units
Enhancing TTS Stability in Hebrew using Discrete Semantic Units
Ella Zeldes
Or Tal
Yossi Adi
27
0
0
28 Oct 2024
Get Large Language Models Ready to Speak: A Late-fusion Approach for
  Speech Generation
Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation
Maohao Shen
Shun Zhang
Jilong Wu
Zhiping Xiu
Ehab AlBadawy
Yiting Lu
M. Seltzer
Qing He
36
2
0
27 Oct 2024
DM-Codec: Distilling Multimodal Representations for Speech Tokenization
DM-Codec: Distilling Multimodal Representations for Speech Tokenization
Md Mubtasim Ahasan
Md Fahim
Tasnim Mohiuddin
A K M Mahbubur Rahman
Aman Chadha
Tariq Iqbal
M. A. Amin
Md. Mofijul Islam
Amin Ahsan Ali
23
0
0
19 Oct 2024
IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice
  Interaction Abilities
IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities
Xin Zhang
Xiang Lyu
Zhihao Du
Qian Chen
Dong Zhang
...
Yuxuan Wang
Bin Zhang
Heng Lu
Yaqian Zhou
Xipeng Qiu
AuLLM
33
6
0
09 Oct 2024
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Cheol Jun Cho
Nicholas Lee
Akshat Gupta
Dhruv Agarwal
Ethan Chen
Alan W Black
Gopala K. Anumanchipalli
32
0
0
09 Oct 2024
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
Alan Baade
Puyuan Peng
David F. Harwath
45
3
0
05 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
59
14
0
01 Oct 2024
SSR: Alignment-Aware Modality Connector for Speech Language Models
SSR: Alignment-Aware Modality Connector for Speech Language Models
Weiting Tan
Hirofumi Inaguma
Ning Dong
Paden Tomasello
Xutai Ma
27
3
0
30 Sep 2024
Speech Recognition Rescoring with Large Speech-Text Foundation Models
Speech Recognition Rescoring with Large Speech-Text Foundation Models
Prashanth Gurunath Shivakumar
J. Kolehmainen
Aditya Gourav
Yi Gu
Ankur Gandhe
Ariya Rastrow
I. Bulyko
AuLLM
26
0
0
25 Sep 2024
Salmon: A Suite for Acoustic Language Model Evaluation
Salmon: A Suite for Acoustic Language Model Evaluation
Gallil Maimon
Amit Roth
Yossi Adi
ELM
AuLLM
49
5
0
11 Sep 2024
LAST: Language Model Aware Speech Tokenization
LAST: Language Model Aware Speech Tokenization
A. Turetzky
Yossi Adi
24
2
0
05 Sep 2024
SpeechPrompt: Prompting Speech Language Models for Speech Processing
  Tasks
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks
Kai-Wei Chang
Haibin Wu
Yu-Kai Wang
Yuan-Kuei Wu
Hua Shen
Wei-Cheng Tseng
Iu-thing Kang
Shang-Wen Li
Hung-yi Lee
39
3
0
23 Aug 2024
SynesLM: A Unified Approach for Audio-visual Speech Recognition and
  Translation via Language Model and Synthetic Data
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data
Yichen Lu
Álvaro Huertas-García
Xuankai Chang
Hengwei Bian
Soumi Maiti
Shinji Watanabe
37
2
0
01 Aug 2024
Discrete Flow Matching
Discrete Flow Matching
Itai Gat
Tal Remez
Neta Shaul
Felix Kreuk
Ricky T. Q. Chen
Gabriel Synnaeve
Yossi Adi
Y. Lipman
DiffM
39
57
0
22 Jul 2024
NAST: Noise Aware Speech Tokenization for Speech Language Models
NAST: Noise Aware Speech Tokenization for Speech Language Models
Shoval Messica
Yossi Adi
22
6
0
16 Jun 2024
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Se Jin Park
Chae Won Kim
Hyeongseop Rha
Minsu Kim
Joanna Hong
Jeong Hun Yeo
Yong Man Ro
CVBM
AuLLM
40
6
0
12 Jun 2024
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Xuankai Chang
Jiatong Shi
Jinchuan Tian
Yuning Wu
Yuxun Tang
Yihan Wu
Shinji Watanabe
Yossi Adi
Xie Chen
Qin Jin
43
15
0
11 Jun 2024
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Zhijun Liu
Shuai Wang
Sho Inoue
Qibing Bai
Haizhou Li
DiffM
40
15
0
08 Jun 2024
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Avihu Dekel
Raul Fernandez
36
2
0
08 Jun 2024
Addressing Index Collapse of Large-Codebook Speech Tokenizer with
  Dual-Decoding Product-Quantized Variational Auto-Encoder
Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Haohan Guo
Fenglong Xie
Dongchao Yang
Hui Lu
Xixin Wu
Helen Meng
53
6
0
05 Jun 2024
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
Vicky Zayats
Peter Chen
Melissa Ferrari
Dirk Padfield
AI4CE
30
0
0
29 May 2024
Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems
Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems
Frank Palma Gomez
Ramon Sanabria
Yun-hsuan Sung
Daniel Matthew Cer
Siddharth Dalmia
Gustavo Hernández Ábrego
VLM
33
4
0
02 Apr 2024
The Larger the Better? Improved LLM Code-Generation via Budget
  Reallocation
The Larger the Better? Improved LLM Code-Generation via Budget Reallocation
Michael Hassid
Tal Remez
Jonas Gehring
Roy Schwartz
Yossi Adi
34
20
0
31 Mar 2024
MSLM-S2ST: A Multitask Speech Language Model for Textless
  Speech-to-Speech Translation with Speaker Style Preservation
MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation
Yifan Peng
Ilia Kulikov
Yilin Yang
Sravya Popuri
Hui Lu
Changhan Wang
Hongyu Gong
28
4
0
19 Mar 2024
An Empirical Study of Speech Language Models for Prompt-Conditioned
  Speech Synthesis
An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis
Yifan Peng
Ilia Kulikov
Yilin Yang
Sravya Popuri
Hui Lu
Changhan Wang
Hongyu Gong
30
1
0
19 Mar 2024
CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning
CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning
Peiyuan Liu
Hang Guo
Tao Dai
Naiqi Li
Jigang Bao
Xudong Ren
Yong Jiang
Shu-Tao Xia
AI4TS
62
16
0
12 Mar 2024
Towards audio language modeling -- an overview
Towards audio language modeling -- an overview
Haibin Wu
Xuanjun Chen
Yi-Cheng Lin
Kai-Wei Chang
Ho-Lam Chung
Alexander H. Liu
Hung-yi Lee
AuLLM
30
28
0
20 Feb 2024
LVCHAT: Facilitating Long Video Comprehension
LVCHAT: Facilitating Long Video Comprehension
Yu-Xiang Wang
Zeyuan Zhang
Julian McAuley
Zexue He
VLM
26
4
0
19 Feb 2024
SpiRit-LM: Interleaved Spoken and Written Language Model
SpiRit-LM: Interleaved Spoken and Written Language Model
Tu Nguyen
Benjamin Muller
Bokai Yu
Marta R. Costa-jussá
Maha Elbayad
...
Itai Gat
Gabriel Synnaeve
Juan Pino
Benoît Sagot
Emmanuel Dupoux
AuLLM
VLM
44
32
0
08 Feb 2024
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
Yihan Wu
Soumi Maiti
Yifan Peng
Wangyou Zhang
Chenda Li
Yuyue Wang
Xihua Wang
Shinji Watanabe
Ruihua Song
25
3
0
31 Jan 2024
GSQA: An End-to-End Model for Generative Spoken Question Answering
GSQA: An End-to-End Model for Generative Spoken Question Answering
Min-Han Shih
Ho-Lam Chung
Yu-Chi Pai
Ming-Hao Hsu
Guan-Ting Lin
Shang-Wen Li
Hung-yi Lee
ELM
AuLLM
19
2
0
15 Dec 2023
Exploring In-Context Learning of Textless Speech Language Model for
  Speech Classification Tasks
Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks
Ming-Hao Hsu
Kai-Wei Chang
Shang-Wen Li
Hung-yi Lee
26
7
0
19 Oct 2023
Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech
  Model
Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model
Kai-Wei Chang
Ming-Hsin Chen
Yun-Ping Lin
Jing Neng Hsu
Paul Kuo-Ming Huang
Chien-yu Huang
Shang-Wen Li
Hung-yi Lee
21
6
0
04 Oct 2023
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model
  Adaptation
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
Guy Yariv
Itai Gat
Sagie Benaim
Lior Wolf
Idan Schwartz
Yossi Adi
DiffM
VGen
29
36
0
28 Sep 2023
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard
  Parameter Sharing
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
B. Grimstad
Xuankai Chang
Antonios Anastasopoulos
Yuya Fujita
Shinji Watanabe
18
2
0
27 Sep 2023
Towards General-Purpose Text-Instruction-Guided Voice Conversion
Towards General-Purpose Text-Instruction-Guided Voice Conversion
Chun-Yi Kuan
Chen An Li
Tsung-Yuan Hsu
T. Lin
Ho-Lam Chung
Kai-Wei Chang
Shuo-yiin Chang
Hung-yi Lee
13
5
0
25 Sep 2023
Voxtlm: unified decoder-only models for consolidating speech
  recognition/synthesis and speech/text continuation tasks
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Soumi Maiti
Yifan Peng
Shukjae Choi
Jee-weon Jung
Xuankai Chang
Shinji Watanabe
VLM
AuLLM
16
54
0
14 Sep 2023
Sparks of Large Audio Models: A Survey and Outlook
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Erik Cambria
Björn W. Schuller
LM&MA
AuLLM
29
36
0
24 Aug 2023
AudioPaLM: A Large Language Model That Can Speak and Listen
AudioPaLM: A Large Language Model That Can Speak and Listen
Paul Kishan Rubenstein
Chulayuth Asawaroengchai
D. Nguyen
Ankur Bapna
Zalan Borsos
...
Neil Zeghidour
Yu Zhang
Zhishuai Zhang
Lukás Zilka
Christian Frank
LM&MA
AuLLM
VLM
35
257
0
22 Jun 2023
SpeechGen: Unlocking the Generative Power of Speech Language Models with
  Prompts
SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts
Haibin Wu
Kai-Wei Chang
Yuan-Kuei Wu
Hung-yi Lee
19
22
0
03 Jun 2023
Spoken Question Answering and Speech Continuation Using
  Spectrogram-Powered LLM
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
Eliya Nachmani
Alon Levkovitch
Roy Hirsch
Julián Salazar
Chulayutsh Asawaroengchai
Soroosh Mariooryad
Ehud Rivlin
RJ Skerry-Ryan
Michelle Tadmor Ramanovich
AuLLM
19
30
0
24 May 2023
12
Next