ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.08295
  4. Cited By
SpeechVerse: A Large-scale Generalizable Audio Language Model

SpeechVerse: A Large-scale Generalizable Audio Language Model

14 May 2024
Nilaksh Das
Saket Dingliwal
S. Ronanki
Rohit Paturi
David Huang
Prashant Mathur
Jie Yuan
Dhanush Bekal
Xing Niu
Sai Muralidhar Jayanthi
Xilai Li
Karel Mundnich
Monica Sunkara
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
Katrin Kirchhoff
ArXivPDFHTML

Papers citing "SpeechVerse: A Large-scale Generalizable Audio Language Model"

35 / 35 papers shown
Title
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
Prabhat Pandey
R. Swaminathan
K V Vijay Girish
Arunasish Sen
Jian Xie
Grant P. Strimel
Andreas Schwarz
36
0
0
12 Apr 2025
On The Landscape of Spoken Language Models: A Comprehensive Survey
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
36
1
0
11 Apr 2025
Qwen2.5-Omni Technical Report
Qwen2.5-Omni Technical Report
Jin Xu
Zhifang Guo
Jinzheng He
Hangrui Hu
Ting He
...
K. Dang
Bin Zhang
X. Wang
Yunfei Chu
Junyang Lin
VGen
AuLLM
86
12
0
26 Mar 2025
Modeling speech emotion with label variance and analyzing performance across speakers and unseen acoustic conditions
Modeling speech emotion with label variance and analyzing performance across speakers and unseen acoustic conditions
Vikramjit Mitra
Amrit Romana
Dung T. Tran
Erdrin Azemi
33
0
0
24 Mar 2025
Retrieval-Augmented Speech Recognition Approach for Domain Challenges
Retrieval-Augmented Speech Recognition Approach for Domain Challenges
Peng Shen
Xugang Lu
Hisashi Kawai
RALM
60
0
0
24 Feb 2025
Chain-of-Description: What I can understand, I can put into words
Chain-of-Description: What I can understand, I can put into words
J. Guo
Daimeng Wei
Z. Li
Hengchao Shang
Yuanchang Luo
Hao Yang
45
0
0
22 Feb 2025
FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems
FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems
Borui Liao
Yulong Xu
Jiao Ou
Kaiyuan Yang
Weihua Jian
Pengfei Wan
Di Zhang
AuLLM
62
0
0
20 Feb 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
Soundwave: Less is More for Speech-Text Alignment in LLMs
Y. Zhang
Zhiheng Liu
Fan Bu
Ruiyu Zhang
Benyou Wang
H. Li
AuLLM
SyDa
VLM
98
0
0
18 Feb 2025
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
Ailin Huang
Boyong Wu
Bruce Wang
Chao Yan
Chen Hu
...
Tianyu Wang
Wenjin Deng
Wuxun Xie
Weipeng Ming
Wenqing He
AuLLM
71
6
0
17 Feb 2025
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
Chao-Han Huck Yang
Jagadeesh Balam
Boris Ginsburg
Yu-Te Wang
Hung-yi Lee
AuLLM
SyDa
95
5
0
28 Jan 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
99
1
0
28 Jan 2025
A Non-autoregressive Model for Joint STT and TTS
A Non-autoregressive Model for Joint STT and TTS
Vishal Sunder
Brian Kingsbury
G. Saon
Samuel Thomas
Slava Shechtman Hagai Aronowitz
Hagai Aronowitz
Eric Fosler-Lussier
Luis A. Lastras
54
0
0
15 Jan 2025
SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
Anurag Kumar
Rohit Paturi
Amber Afshan
S. Srinivasan
41
0
0
14 Jan 2025
Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward
Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward
Shashi Kumar
Iuliia Thorbecke
Sergio Burdisso
Esaú Villatoro-Tello
M. Errecalde
Kadri Hacioğlu
Pradeep Rangappa
P. Motlícek
A. Ganapathiraju
Andreas Stolcke
46
1
0
06 Nov 2024
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant
Alan Dao
Dinh Bach Vu
Huy Hoang Ha
AuLLM
VLM
57
3
0
20 Oct 2024
Roadmap towards Superhuman Speech Understanding using Large Language
  Models
Roadmap towards Superhuman Speech Understanding using Large Language Models
Fan Bu
Yuhao Zhang
X. Wang
Benyou Wang
Q. Liu
H. Li
LM&MA
ELM
AuLLM
33
1
0
17 Oct 2024
MINER: Mining the Underlying Pattern of Modality-Specific Neurons in
  Multimodal Large Language Models
MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models
Kaichen Huang
Jiahao Huo
Yibo Yan
Kun Wang
Yutao Yue
Xuming Hu
31
2
0
07 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
57
14
0
01 Oct 2024
PALM: Few-Shot Prompt Learning for Audio Language Models
PALM: Few-Shot Prompt Learning for Audio Language Models
Asif Hanif
M. Agro
Mohammad Areeb Qazi
Hanan Aldarmaki
VLM
18
1
0
29 Sep 2024
Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large
  Language Models
Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Yiming Chen
Xianghu Yue
Xiaoxue Gao
Chen Zhang
L. F. D’Haro
R. Tan
Haizhou Li
AuLLM
30
0
0
27 Sep 2024
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
42
11
0
26 Sep 2024
What Are They Doing? Joint Audio-Speech Co-Reasoning
What Are They Doing? Joint Audio-Speech Co-Reasoning
Yingzhi Wang
Pooneh Mousavi
Artem Ploujnikov
Mirco Ravanelli
AuLLM
44
0
0
22 Sep 2024
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Qingkai Fang
Shoutao Guo
Yan Zhou
Zhengrui Ma
Shaolei Zhang
Yang Feng
AuLLM
25
29
0
10 Sep 2024
MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders
MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders
W. Zhang
Shuo Sun
Bin Wang
Xunlong Zou
Zhuohan Liu
Yingxu He
Geyu Lin
Nancy F. Chen
A. Aw
AuLLM
65
1
0
10 Sep 2024
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Zhifei Xie
Changqiao Wu
AuLLM
VGen
VLM
SyDa
LRM
29
52
0
29 Aug 2024
On Pre-training of Multimodal Language Models Customized for Chart
  Understanding
On Pre-training of Multimodal Language Models Customized for Chart Understanding
Wan-Cyuan Fan
Yen-Chun Chen
Mengchen Liu
Lu Yuan
Leonid Sigal
36
4
0
19 Jul 2024
Qwen2-Audio Technical Report
Qwen2-Audio Technical Report
Yunfei Chu
Jin Xu
Qian Yang
Haojie Wei
Xipin Wei
...
Yuanjun Lv
Jinzheng He
Junyang Lin
Chang Zhou
Jingren Zhou
AuLLM
VLM
29
100
0
15 Jul 2024
AudioBench: A Universal Benchmark for Audio Large Language Models
AudioBench: A Universal Benchmark for Audio Large Language Models
Bin Wang
Xunlong Zou
Geyu Lin
S.
Zhuohan Liu
Wenyu Zhang
Zhengyuan Liu
AiTi Aw
Nancy F. Chen
AuLLM
ELM
LM&MA
85
17
0
23 Jun 2024
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large
  Language Models
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models
Raghuveer Peri
Sai Muralidhar Jayanthi
S. Ronanki
Anshu Bhatia
Karel Mundnich
...
Srikanth Vishnubhotla
Daniel Garcia-Romero
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
AAML
29
3
0
14 May 2024
SLM: Bridge the thin gap between speech and text foundation models
SLM: Bridge the thin gap between speech and text foundation models
Mingqiu Wang
Wei Han
Izhak Shafran
Zelin Wu
Chung-Cheng Chiu
...
Zhong Meng
Golan Pundak
Nikhil Siddhartha
J. Schalkwyk
Yonghui Wu
AuLLM
37
56
0
30 Sep 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
382
4,010
0
28 Jan 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language
  Processing
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Junyi Ao
Rui Wang
Long Zhou
Chengyi Wang
Shuo Ren
...
Yu Zhang
Zhihua Wei
Yao Qian
Jinyu Li
Furu Wei
110
192
0
14 Oct 2021
Learning to Prompt for Vision-Language Models
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
322
2,108
0
02 Sep 2021
1