ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.11000
  4. Cited By
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal
  Conversational Abilities

SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities

18 May 2023
Dong Zhang
Shimin Li
Xin Zhang
Jun Zhan
Pengyu Wang
Yaqian Zhou
Xipeng Qiu
    AuLLM
    MLLM
ArXivPDFHTML

Papers citing "SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities"

50 / 223 papers shown
Title
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Julian Parker
Anton Smirnov
Jordi Pons
CJ Carr
Zack Zukowski
Zach Evans
Xubo Liu
68
9
0
29 Nov 2024
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and
  Generation
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
Wenyi Yu
Siyin Wang
Xiaoyu Yang
Xianzhao Chen
Xiaohai Tian
J. Zhang
Guangzhi Sun
Lu Lu
Y. Wang
Chao Zhang
AuLLM
61
6
0
27 Nov 2024
AMPS: ASR with Multimodal Paraphrase Supervision
AMPS: ASR with Multimodal Paraphrase Supervision
Amruta Parulekar
Abhishek Gupta
Sameep Chattopadhyay
P. Jyothi
72
0
0
27 Nov 2024
Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation
Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation
Jungeun Kim
Hyeongwoo Jeon
Jongseong Bae
Ha Young Kim
SLR
72
0
0
25 Nov 2024
Who Can Withstand Chat-Audio Attacks? An Evaluation Benchmark for Large
  Language Models
Who Can Withstand Chat-Audio Attacks? An Evaluation Benchmark for Large Language Models
Wanqi Yang
Y. Li
Meng Fang
Yunchao Wei
Tianyi Zhou
L. Chen
AAML
ELM
AuLLM
62
1
0
22 Nov 2024
MLAN: Language-Based Instruction Tuning Improves Zero-Shot
  Generalization of Multimodal Large Language Models
MLAN: Language-Based Instruction Tuning Improves Zero-Shot Generalization of Multimodal Large Language Models
Jianhong Tu
Zhuohao Ni
Nicholas Crispino
Zihao Yu
Michael Bendersky
...
Ruoxi Jia
Xin Liu
Lingjuan Lyu
Dawn Song
Chenguang Wang
VLM
MLLM
45
0
0
15 Nov 2024
StreamAdapter: Efficient Test Time Adaptation from Contextual Streams
StreamAdapter: Efficient Test Time Adaptation from Contextual Streams
Dilxat Muhtar
Yelong Shen
Y. Yang
Xiaodong Liu
Yadong Lu
...
Feng Sun
Xueliang Zhang
Jianfeng Gao
Weizhu Chen
Qi Zhang
TTA
54
0
0
14 Nov 2024
Spider: Any-to-Many Multimodal LLM
Spider: Any-to-Many Multimodal LLM
Jinxiang Lai
Jie Zhang
Jun Liu
Jian Li
Xiaocheng Lu
Song Guo
MLLM
39
2
0
14 Nov 2024
Addressing Representation Collapse in Vector Quantized Models with One
  Linear Layer
Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Yongxin Zhu
B. Li
Yifei Xin
Linli Xu
30
10
0
04 Nov 2024
Align-SLM: Textless Spoken Language Models with Reinforcement Learning
  from AI Feedback
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
Guan-Ting Lin
Prashanth Gurunath Shivakumar
Aditya Gourav
Yile Gu
Ankur Gandhe
Hung-yi Lee
I. Bulyko
21
8
0
04 Nov 2024
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model
  with Frozen LLM
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Xiong Wang
Yangze Li
Chaoyou Fu
Yunhang Shen
Lei Xie
Ke Li
Xing Sun
Long Ma
AuLLM
MLLM
29
25
0
01 Nov 2024
NeuGPT: Unified multi-modal Neural GPT
NeuGPT: Unified multi-modal Neural GPT
Yiqian Yang
Yiqun Duan
Hyejeong Jo
Qiang Zhang
Renjing Xu
Oiwi Parker Jones
Xuming Hu
Chin-Teng Lin
Hui Xiong
21
1
0
28 Oct 2024
Get Large Language Models Ready to Speak: A Late-fusion Approach for
  Speech Generation
Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation
Maohao Shen
Shun Zhang
Jilong Wu
Zhiping Xiu
Ehab AlBadawy
Yiting Lu
M. Seltzer
Qing He
26
2
0
27 Oct 2024
VoiceBench: Benchmarking LLM-Based Voice Assistants
VoiceBench: Benchmarking LLM-Based Voice Assistants
Yiming Chen
Xianghu Yue
Chen Zhang
Xiaoxue Gao
R. Tan
H. Li
ELM
AuLLM
26
17
0
22 Oct 2024
DM-Codec: Distilling Multimodal Representations for Speech Tokenization
DM-Codec: Distilling Multimodal Representations for Speech Tokenization
Md Mubtasim Ahasan
Md Fahim
Tasnim Mohiuddin
A K M Mahbubur Rahman
Aman Chadha
Tariq Iqbal
M. A. Amin
Md. Mofijul Islam
Amin Ahsan Ali
11
0
0
19 Oct 2024
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction
  and Speculative Decoding
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Tan Dat Nguyen
Ji-Hoon Kim
Jeongsoo Choi
Shukjae Choi
Jinseok Park
Younglo Lee
Joon Son Chung
26
0
0
17 Oct 2024
Parameter-efficient Adaptation of Multilingual Multimodal Models for
  Low-resource ASR
Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR
Abhishek Gupta
Amruta Parulekar
Sameep Chattopadhyay
P. Jyothi
VLM
18
0
0
17 Oct 2024
Roadmap towards Superhuman Speech Understanding using Large Language
  Models
Roadmap towards Superhuman Speech Understanding using Large Language Models
Fan Bu
Yuhao Zhang
X. Wang
Benyou Wang
Q. Liu
H. Li
LM&MA
ELM
AuLLM
30
1
0
17 Oct 2024
IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice
  Interaction Abilities
IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities
Xin Zhang
Xiang Lyu
Zhihao Du
Qian Chen
Dong Zhang
...
Yuxuan Wang
Bin Zhang
Heng Lu
Yaqian Zhou
Xipeng Qiu
AuLLM
20
5
0
09 Oct 2024
HALL-E: Hierarchical Neural Codec Language Model for Minute-Long
  Zero-Shot Text-to-Speech Synthesis
HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis
Yuto Nishimura
Takumi Hirose
Masanari Ohi
Hideki Nakayama
Nakamasa Inoue
VLM
21
1
0
06 Oct 2024
Self-Powered LLM Modality Expansion for Large Speech-Text Models
Self-Powered LLM Modality Expansion for Large Speech-Text Models
Tengfei Yu
Xuebo Liu
Zhiyi Hou
Liang Ding
Dacheng Tao
Min Zhang
27
0
0
04 Oct 2024
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models
Yue Zhang
Zhiyang Xu
Ying Shen
Parisa Kordjamshidi
Lifu Huang
21
6
0
04 Oct 2024
Efficient Streaming LLM for Speech Recognition
Efficient Streaming LLM for Speech Recognition
J. Jia
Gil Keren
Wei Zhou
Egor Lakomkin
Xiaohui Zhang
Chunyang Wu
Frank Seide
Jay Mahadeokar
Ozlem Kalinli
AuLLM
20
0
0
02 Oct 2024
LASMP: Language Aided Subset Sampling Based Motion Planner
LASMP: Language Aided Subset Sampling Based Motion Planner
Saswati Bhattacharjee
Anirban Sinha
Chinwe Ekenna
LM&Ro
19
0
0
01 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
57
14
0
01 Oct 2024
CoT-ST: Enhancing LLM-based Speech Translation with Multimodal
  Chain-of-Thought
CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought
Yexing Du
Ziyang Ma
Yifan Yang
Keqi Deng
Xie Chen
Bo Yang
Yang Xiang
Ming Liu
Bing Qin
LRM
13
6
0
29 Sep 2024
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for
  Neural Codec Language Models
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
Wenrui Liu
Zhifang Guo
Jin Xu
Yuanjun Lv
Yunfei Chu
Zhou Zhao
Junyang Lin
36
1
0
28 Sep 2024
Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large
  Language Models
Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Yiming Chen
Xianghu Yue
Xiaoxue Gao
Chen Zhang
L. F. D’Haro
R. Tan
Haizhou Li
AuLLM
20
0
0
27 Sep 2024
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
38
11
0
26 Sep 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
AuLLM
MLLM
VLM
50
21
0
26 Sep 2024
Internalizing ASR with Implicit Chain of Thought for Efficient
  Speech-to-Speech Conversational LLM
Internalizing ASR with Implicit Chain of Thought for Efficient Speech-to-Speech Conversational LLM
Robin Shing-Hei Yuen
Timothy Tin-Long Tse
Jian Zhu
AuLLM
25
3
0
25 Sep 2024
Speech Recognition Rescoring with Large Speech-Text Foundation Models
Speech Recognition Rescoring with Large Speech-Text Foundation Models
Prashanth Gurunath Shivakumar
J. Kolehmainen
Aditya Gourav
Yi Gu
Ankur Gandhe
Ariya Rastrow
I. Bulyko
AuLLM
16
0
0
25 Sep 2024
Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character
  Pre-training in LLMs
Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs
Yang Yuhang
Peng Yizhou
Eng Siong Chng
Xionghu Zhong
AuLLM
AI4CE
14
0
0
24 Sep 2024
Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue
  Agents
Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents
Bandhav Veluri
Benjamin Peloquin
Bokai Yu
Hongyu Gong
Shyamnath Gollakota
AuLLM
OffRL
29
13
0
23 Sep 2024
Speechworthy Instruction-tuned Language Models
Speechworthy Instruction-tuned Language Models
Hyundong Justin Cho
Nicolaas Jedema
Leonardo F. R. Ribeiro
Karishma Sharma
Pedro Szekely
Alessandro Moschitti
Ruben Janssen
Jonathan May
ALM
31
1
0
23 Sep 2024
Enhancing Low-Resource Language and Instruction Following Capabilities
  of Audio Language Models
Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models
Potsawee Manakul
Guangzhi Sun
Warit Sirichotedumrong
Kasima Tharnpipitchai
Kunat Pipatanakul
AuLLM
28
4
0
17 Sep 2024
Enhancing Multilingual Speech Generation and Recognition Abilities in
  LLMs with Constructed Code-switched Data
Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data
Jing Xu
Daxin Tan
Jiaqi Wang
Xiao Chen
14
0
0
17 Sep 2024
Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT
Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT
Ryota Komatsu
Takahiro Shinozaki
SSL
24
1
0
16 Sep 2024
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in
  New Paradigm
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm
Yuning Wu
Jiatong Shi
Yifeng Yu
Yuxun Tang
Tao Qian
Yueqian Lin
Jionghao Han
Xinyi Bai
Shinji Watanabe
Qin Jin
16
3
0
11 Sep 2024
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Qingkai Fang
Shoutao Guo
Yan Zhou
Zhengrui Ma
Shaolei Zhang
Yang Feng
AuLLM
25
29
0
10 Sep 2024
Comparing Discrete and Continuous Space LLMs for Speech Recognition
Comparing Discrete and Continuous Space LLMs for Speech Recognition
Yaoxun Xu
Shi-Xiong Zhang
Jianwei Yu
Zhiyong Wu
Dong Yu
AuLLM
14
3
0
01 Sep 2024
Progressive Residual Extraction based Pre-training for Speech
  Representation Learning
Progressive Residual Extraction based Pre-training for Speech Representation Learning
Tianrui Wang
Jin Li
Ziyang Ma
Rui Cao
Xie Chen
...
Meng Ge
Xiaobao Wang
Yuguang Wang
Jianwu Dang
Nyima Tashi
SSL
30
0
0
31 Aug 2024
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Zhifei Xie
Changqiao Wu
AuLLM
VGen
VLM
SyDa
LRM
21
52
0
29 Aug 2024
SALSA: Speedy ASR-LLM Synchronous Aggregation
SALSA: Speedy ASR-LLM Synchronous Aggregation
Ashish R. Mittal
Darshan Prabhu
Sunita Sarawagi
P. Jyothi
21
2
0
29 Aug 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
36
32
0
29 Aug 2024
A Transcription Prompt-based Efficient Audio Large Language Model for
  Robust Speech Recognition
A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition
Yangze Li
Xiong Wang
Songjun Cao
Yike Zhang
Long Ma
Lei Xie
AuLLM
43
0
0
18 Aug 2024
Style-Talker: Finetuning Audio Language Model and Style-Based
  Text-to-Speech Model for Fast Spoken Dialogue Generation
Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
Yinghao Aaron Li
Xilin Jiang
Jordan Darefsky
Ge Zhu
N. Mesgarani
18
2
0
13 Aug 2024
Language Model Can Listen While Speaking
Language Model Can Listen While Speaking
Ziyang Ma
Yakun Song
Chenpeng Du
Jian Cong
Zhuo Chen
Yuping Wang
Y. Wang
Xie Chen
AuLLM
29
23
0
05 Aug 2024
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks
  With Large Language Model
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model
Zhaowei Li
Wei Wang
Yiqing Cai
Xu Qi
Pengyu Wang
Dong Zhang
Hang Song
Botian Jiang
Zhida Huang
Tao Wang
AIFin
LRM
27
3
0
05 Aug 2024
A Comprehensive Review of Multimodal Large Language Models: Performance
  and Challenges Across Different Tasks
A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks
Jiaqi Wang
Hanqi Jiang
Yi-Hsueh Liu
Chong Ma
Xu-Yao Zhang
...
Xin Zhang
Wei Zhang
Dinggang Shen
Tianming Liu
Shu Zhang
VLM
AI4TS
29
18
0
02 Aug 2024
Previous
12345
Next