ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.17417
  4. Cited By

Speechless: Speech Instruction Training Without Speech for Low Resource Languages

23 May 2025
Alan Dao
Dinh Bach Vu
Huy Hoang Ha
Tuan Le Duc Anh
Shreyas Gopal
Yue Heng Yeo
Warren Keng Hoong Low
Eng Siong Chng
J. Yip
    SyDa
ArXiv (abs)PDFHTML

Papers citing "Speechless: Speech Instruction Training Without Speech for Low Resource Languages"

29 / 29 papers shown
Title
TESU-LLM: Training Speech-LLMs Without Speech via Unified Encoder Alignment
TESU-LLM: Training Speech-LLMs Without Speech via Unified Encoder Alignment
Taesoo Kim
Jong Hwan Ko
AuLLM
15
0
0
01 Jun 2025
VoiceBench: Benchmarking LLM-Based Voice Assistants
VoiceBench: Benchmarking LLM-Based Voice Assistants
Yiming Chen
Xianghu Yue
Chen Zhang
Xiaoxue Gao
R. Tan
Haoyang Li
ELMAuLLM
118
29
0
22 Oct 2024
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant
Alan Dao
Dinh Bach Vu
Huy Hoang Ha
AuLLMVLM
135
5
0
20 Oct 2024
Distilling an End-to-End Voice Assistant Without Instruction Training
  Data
Distilling an End-to-End Voice Assistant Without Instruction Training Data
William B. Held
Ella Li
Michael Joseph Ryan
Weiyan Shi
Yanzhe Zhang
Diyi Yang
AuLLM
87
16
0
03 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
189
25
0
01 Oct 2024
Moshi: a speech-text foundation model for real-time dialogue
Moshi: a speech-text foundation model for real-time dialogue
Alexandre Défossez
Laurent Mazaré
Manu Orsini
Amélie Royer
P. Pérez
Hervé Jégou
Edouard Grave
Neil Zeghidour
AuLLM
161
150
0
17 Sep 2024
Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models
Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models
Potsawee Manakul
Guangzhi Sun
Warit Sirichotedumrong
Kasima Tharnpipitchai
Kunat Pipatanakul
AuLLM
112
7
0
17 Sep 2024
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Qingkai Fang
Shoutao Guo
Yan Zhou
Zhengrui Ma
Shaolei Zhang
Yang Feng
AuLLM
117
54
0
10 Sep 2024
Instruction Data Generation and Unsupervised Adaptation for Speech
  Language Models
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models
Vahid Noroozi
Zhehuai Chen
Somshubra Majumdar
Steve Huang
Jagadeesh Balam
Boris Ginsburg
SyDa
139
5
0
18 Jun 2024
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts
  for Text-to-Speech and Style Captioning
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
Masaya Kawamura
Ryuichi Yamamoto
Yuma Shirahata
Takuya Hasumi
Kentaro Tachibana
VLM
73
12
0
12 Jun 2024
ASTRA: Aligning Speech and Text Representations for Asr without Sampling
ASTRA: Aligning Speech and Text Representations for Asr without Sampling
Neeraj Gaur
Rohan Agrawal
Gary Wang
Parisa Haghani
Andrew Rosenberg
Bhuvana Ramabhadran
83
1
0
10 Jun 2024
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through
  Direct Preference Optimization
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Navonil Majumder
Chia-Yu Hung
Deepanway Ghosal
Wei-Ning Hsu
Rada Mihalcea
Soujanya Poria
137
61
0
15 Apr 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Percy Liang
Tatsunori Hashimoto
ALM
161
403
0
06 Apr 2024
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
Jing Pan
Jian Wu
Yashesh Gaur
S. Sivasankaran
Zhuo Chen
Shujie Liu
Jinyu Li
ELM
84
29
0
03 Nov 2023
SALMONN: Towards Generic Hearing Abilities for Large Language Models
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
LM&MAAuLLM
110
264
0
20 Oct 2023
Efficient Memory Management for Large Language Model Serving with
  PagedAttention
Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
202
2,333
0
12 Sep 2023
BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment
  of Continuation Writing
BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing
Chen Wang
Minpeng Liao
Zhongqiang Huang
Jinliang Lu
Junhong Wu
Yuchen Liu
Chengqing Zong
Jiajun Zhang
AuLLM
124
45
0
02 Sep 2023
Universal and Transferable Adversarial Attacks on Aligned Language
  Models
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
297
1,525
0
27 Jul 2023
Robust Speech Recognition via Large-Scale Weak Supervision
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
230
3,760
0
06 Dec 2022
Low-Resource Multilingual and Zero-Shot Multispeaker TTS
Low-Resource Multilingual and Zero-Shot Multispeaker TTS
Florian Lux
Julia Koch
Ngoc Thang Vu
101
23
0
21 Oct 2022
An Analysis of Semantically-Aligned Speech-Text Embeddings
An Analysis of Semantically-Aligned Speech-Text Embeddings
M. Huzaifah
Ivan Kukanov
88
8
0
04 Apr 2022
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language
  Processing
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Junyi Ao
Rui Wang
Long Zhou
Chengyi Wang
Shuo Ren
...
Yu Zhang
Zhihua Wei
Yao Qian
Jinyu Li
Furu Wei
148
202
0
14 Oct 2021
SD-QA: Spoken Dialectal Question Answering for the Real World
SD-QA: Spoken Dialectal Question Answering for the Real World
Fahim Faisal
Sharlina Keshava
ibn Alam
Antonios Anastasopoulos
145
32
0
24 Sep 2021
MLS: A Large-Scale Multilingual Dataset for Speech Research
MLS: A Large-Scale Multilingual Dataset for Speech Research
Vineel Pratap
Qiantong Xu
Anuroop Sriram
Gabriel Synnaeve
R. Collobert
AuLLM
154
512
0
07 Dec 2020
Common Voice: A Massively-Multilingual Speech Corpus
Common Voice: A Massively-Multilingual Speech Corpus
Rosana Ardila
Megan Branson
Kelly Davis
Michael Henretty
M. Kohler
Josh Meyer
Reuben Morais
Lindsay Saunders
Francis M. Tyers
Gregor Weber
VLM
102
1,622
0
13 Dec 2019
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
128
959
0
05 Apr 2019
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book
  Question Answering
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
Todor Mihaylov
Peter Clark
Tushar Khot
Ashish Sabharwal
125
1,571
0
08 Sep 2018
Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces
Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces
Yu-An Chung
W. Weng
S. Tong
James R. Glass
82
100
0
18 May 2018
Ray: A Distributed Framework for Emerging AI Applications
Ray: A Distributed Framework for Emerging AI Applications
Philipp Moritz
Robert Nishihara
Stephanie Wang
Alexey Tumanov
Richard Liaw
...
Melih Elibol
Zongheng Yang
William Paul
Michael I. Jordan
Ion Stoica
GNN
128
1,269
0
16 Dec 2017
1