ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.11000
  4. Cited By
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal
  Conversational Abilities

SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities

18 May 2023
Dong Zhang
Shimin Li
Xin Zhang
Jun Zhan
Pengyu Wang
Yaqian Zhou
Xipeng Qiu
    AuLLM
    MLLM
ArXivPDFHTML

Papers citing "SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities"

50 / 223 papers shown
Title
Generative Expressive Conversational Speech Synthesis
Generative Expressive Conversational Speech Synthesis
Rui Liu
Yifan Hu
Yi Ren
Xiang Yin
Haizhou Li
38
5
0
31 Jul 2024
dMel: Speech Tokenization made Simple
dMel: Speech Tokenization made Simple
Richard He Bai
Tatiana Likhomanenko
Ruixiang Zhang
Zijin Gu
Zakaria Aldeneh
Navdeep Jaitly
22
4
0
22 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
19
4
0
21 Jul 2024
Audio-visual training for improved grounding in video-text LLMs
Audio-visual training for improved grounding in video-text LLMs
Shivprasad Sagare
Hemachandran S
Kinshuk Sarabhai
Prashant Ullegaddi
SA Rajeshkumar
21
0
0
21 Jul 2024
F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
Jie-jin Yang
Xuesong Niu
Nan Jiang
Ruimao Zhang
Siyuan Huang
17
9
0
17 Jul 2024
Qwen2-Audio Technical Report
Qwen2-Audio Technical Report
Yunfei Chu
Jin Xu
Qian Yang
Haojie Wei
Xipin Wei
...
Yuanjun Lv
Jinzheng He
Junyang Lin
Chang Zhou
Jingren Zhou
AuLLM
VLM
24
100
0
15 Jul 2024
Hypergraph Multi-modal Large Language Model: Exploiting EEG and
  Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video
  Understanding
Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding
Minghui Wu
Chenxu Zhao
Anyang Su
Donglin Di
Tianyu Fu
...
Min He
Ya Gao
Meng Ma
Kun Yan
Ping Wang
12
0
0
11 Jul 2024
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference
  Optimization
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Yuchen Hu
Chen Chen
Siyin Wang
Eng Siong Chng
C. Zhang
35
3
0
02 Jul 2024
An End-to-End Speech Summarization Using Large Language Model
An End-to-End Speech Summarization Using Large Language Model
Hengchao Shang
Zongyao Li
Jiaxin Guo
Shaojun Li
Zhiqiang Rao
Yuanchang Luo
Daimeng Wei
Hao Yang
26
0
0
02 Jul 2024
BESTOW: Efficient and Streamable Speech Language Model with the Best of
  Two Worlds in GPT and T5
BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
Zhehuai Chen
He Huang
Oleksii Hrinchuk
Krishna C. Puvvada
Nithin Rao Koluguri
Piotr Żelasko
Jagadeesh Balam
Boris Ginsburg
AuLLM
RALM
23
10
0
28 Jun 2024
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech
  Units: A Pilot Study
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study
Peikun Chen
Sining Sun
Changhao Shan
Qing Yang
Lei Xie
29
2
0
27 Jun 2024
FASA: a Flexible and Automatic Speech Aligner for Extracting
  High-quality Aligned Children Speech Data
FASA: a Flexible and Automatic Speech Aligner for Extracting High-quality Aligned Children Speech Data
Dancheng Liu
Jinjun Xiong
11
0
0
25 Jun 2024
A Comprehensive Solution to Connect Speech Encoder and Large Language
  Model for ASR
A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR
Van Tung Pham
Yist Y. Lin
Tao Han
Wei Li
Jun Zhang
Lu Lu
Yuxuan Wang
AuLLM
24
1
0
25 Jun 2024
Decoder-only Architecture for Streaming End-to-end Speech Recognition
Decoder-only Architecture for Streaming End-to-end Speech Recognition
E. Tsunoo
Hayato Futami
Yosuke Kashiwagi
Siddhant Arora
Shinji Watanabe
RALM
AuLLM
23
6
0
23 Jun 2024
AudioBench: A Universal Benchmark for Audio Large Language Models
AudioBench: A Universal Benchmark for Audio Large Language Models
Bin Wang
Xunlong Zou
Geyu Lin
S.
Zhuohan Liu
Wenyu Zhang
Zhengyuan Liu
AiTi Aw
Nancy F. Chen
AuLLM
ELM
LM&MA
85
17
0
23 Jun 2024
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
Guangzhi Sun
Wenyi Yu
Changli Tang
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Yuxuan Wang
Chao Zhang
42
2
0
22 Jun 2024
Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions
Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions
M. Baskar
Andrew Rosenberg
Bhuvana Ramabhadran
Neeraj Gaur
Zhong Meng
19
3
0
20 Jun 2024
Seamless Language Expansion: Enhancing Multilingual Mastery in
  Self-Supervised Models
Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models
Jing Xu
Minglin Wu
Xixin Wu
Helen Meng
CLL
19
1
0
20 Jun 2024
Transferable speech-to-text large language model alignment module
Transferable speech-to-text large language model alignment module
Boyong Wu
Chao Yan
Haoran Pu
27
0
0
19 Jun 2024
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible
  Acoustic Reception and Reaction
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Haoqiu Yan
Yongxin Zhu
Kai Zheng
Bing Liu
Haoyu Cao
Deqiang Jiang
Linli Xu
AuLLM
16
4
0
18 Jun 2024
Towards Audio Codec-based Speech Separation
Towards Audio Codec-based Speech Separation
J. Yip
Shengkui Zhao
Dianwen Ng
Eng Siong Chng
Bin Ma
14
0
0
18 Jun 2024
PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency
  Spoken Dialogue Systems
PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems
Kentaro Mitsui
Koh Mitsuda
Toshiaki Wakatsuki
Yukiya Hono
Kei Sawada
23
2
0
18 Jun 2024
Towards an End-to-End Framework for Invasive Brain Signal Decoding with
  Large Language Models
Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models
Sheng Feng
Heyang Liu
Yu Wang
Yanfeng Wang
14
3
0
17 Jun 2024
Improving Quotation Attribution with Fictional Character Embeddings
Improving Quotation Attribution with Fictional Character Embeddings
Gaspard Michel
Elena V. Epure
Romain Hennequin
Christophe Cerisara
14
0
0
17 Jun 2024
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot
  Audio Task Learner
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Dongchao Yang
Haohan Guo
Yuanyuan Wang
Rongjie Huang
Xiang Li
Xu Tan
Xixin Wu
Helen Meng
AuLLM
31
15
0
14 Jun 2024
GPT-4o: Visual perception performance of multimodal large language
  models in piglet activity understanding
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding
Yiqi Wu
Xiaodan Hu
Ziming Fu
Siling Zhou
Jiangong Li
MLLM
14
9
0
14 Jun 2024
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech
  Units for Spoken Language Understanding
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
Suwon Shon
Kwangyoun Kim
Yi-Te Hsu
Prashant Sridhar
Shinji Watanabe
Karen Livescu
AuLLM
28
2
0
13 Jun 2024
TokSing: Singing Voice Synthesis based on Discrete Tokens
TokSing: Singing Voice Synthesis based on Discrete Tokens
Yuning Wu
Chunlei Zhang
Jiatong Shi
Yuxun Tang
Shan Yang
Qin Jin
14
6
0
12 Jun 2024
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Se Jin Park
Chae Won Kim
Hyeongseop Rha
Minsu Kim
Joanna Hong
Jeong Hun Yeo
Yong Man Ro
CVBM
AuLLM
27
6
0
12 Jun 2024
PolySpeech: Exploring Unified Multitask Speech Models for
  Competitiveness with Single-task Models
PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models
Runyan Yang
Huibao Yang
Xiqing Zhang
Tiantian Ye
Ying Liu
Yingying Gao
Shilei Zhang
Chao Deng
Junlan Feng
21
0
0
12 Jun 2024
TRINS: Towards Multimodal Language Models that Can Read
TRINS: Towards Multimodal Language Models that Can Read
Ruiyi Zhang
Yanzhe Zhang
Jian Chen
Yufan Zhou
Jiuxiang Gu
Changyou Chen
Tong Sun
VLM
18
6
0
10 Jun 2024
Prompting Large Language Models with Audio for General-Purpose Speech
  Summarization
Prompting Large Language Models with Audio for General-Purpose Speech Summarization
Wonjune Kang
Deb Roy
LRM
16
7
0
10 Jun 2024
Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models
Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models
Xi Li
Yusen Zhang
Renze Lou
Chen Wu
Jiaqi Wang
LRM
AAML
24
11
0
10 Jun 2024
Zero-Shot End-To-End Spoken Question Answering In Medical Domain
Zero-Shot End-To-End Spoken Question Answering In Medical Domain
Yanis Labrak
Adel Moumen
Richard Dufour
Mickael Rouvier
ELM
LM&MA
MedIm
21
0
0
09 Jun 2024
BLSP-Emo: Towards Empathetic Large Speech-Language Models
BLSP-Emo: Towards Empathetic Large Speech-Language Models
Chen Wang
Minpeng Liao
Zhongqiang Huang
Junhong Wu
Chengqing Zong
Jiajun Zhang
VLM
AuLLM
30
2
0
06 Jun 2024
Multimodal Reasoning with Multimodal Knowledge Graph
Multimodal Reasoning with Multimodal Knowledge Graph
Junlin Lee
Yequan Wang
Jing Li
Min Zhang
29
14
0
04 Jun 2024
Generative Pre-trained Speech Language Model with Efficient Hierarchical
  Transformer
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Yongxin Zhu
Dan Su
Liqiang He
Linli Xu
Dong Yu
26
0
0
03 Jun 2024
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
Chen Chen
Yuchen Hu
Wen Wu
Helin Wang
Chng Eng Siong
Chao Zhang
33
1
0
02 Jun 2024
BLSP-KD: Bootstrapping Language-Speech Pre-training via Knowledge
  Distillation
BLSP-KD: Bootstrapping Language-Speech Pre-training via Knowledge Distillation
Chen Wang
Minpeng Liao
Zhongqiang Huang
Jiajun Zhang
ALM
AuLLM
32
1
0
29 May 2024
M$^3$GPT: An Advanced Multimodal, Multitask Framework for Motion
  Comprehension and Generation
M3^33GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation
Mingshuang Luo
Ruibing Hou
Hong Chang
Zimo Liu
Yaowei Wang
Shiguang Shan
27
11
0
25 May 2024
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Zhenwei Shao
Zhou Yu
Jun Yu
Xuecheng Ouyang
Lihao Zheng
Zhenbiao Gai
Mingyang Wang
Jiajun Ding
21
9
0
20 May 2024
Listen Again and Choose the Right Answer: A New Paradigm for Automatic
  Speech Recognition with Large Language Models
Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
Yuchen Hu
Chen Chen
Chengwei Qin
Qiushi Zhu
E. Chng
Ruizhe Li
AuLLM
KELM
22
5
0
16 May 2024
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large
  Language Models
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models
Raghuveer Peri
Sai Muralidhar Jayanthi
S. Ronanki
Anshu Bhatia
Karel Mundnich
...
Srikanth Vishnubhotla
Daniel Garcia-Romero
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
AAML
19
3
0
14 May 2024
SpeechVerse: A Large-scale Generalizable Audio Language Model
SpeechVerse: A Large-scale Generalizable Audio Language Model
Nilaksh Das
Saket Dingliwal
S. Ronanki
Rohit Paturi
David Huang
...
Monica Sunkara
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
Katrin Kirchhoff
33
37
0
14 May 2024
Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Xuelong Geng
Tianyi Xu
Kun Wei
Bingshen Mu
Hongfei Xue
...
Pengcheng Guo
Yuhang Dai
Longhao Li
Mingchen Shao
Lei Xie
30
9
0
03 May 2024
Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs
Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs
Yu Xia
Rui Wang
Xu Liu
Mingyan Li
Tong Yu
Xiang Chen
Julian McAuley
Shuai Li
LRM
35
16
0
24 Apr 2024
SpeechAlign: Aligning Speech Generation to Human Preferences
SpeechAlign: Aligning Speech Generation to Human Preferences
Dong Zhang
Zhaowei Li
Shimin Li
Xin Zhang
Pengyu Wang
Yaqian Zhou
Xipeng Qiu
ALM
AuLLM
32
4
0
08 Apr 2024
Scaling Properties of Speech Language Models
Scaling Properties of Speech Language Models
Santiago Cuervo
R. Marxer
21
9
0
31 Mar 2024
WavLLM: Towards Robust and Adaptive Speech Large Language Model
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Shujie Hu
Long Zhou
Shujie Liu
Sanyuan Chen
Hongkun Hao
...
Xunying Liu
Jinyu Li
S. Sivasankaran
Linquan Liu
Furu Wei
AuLLM
21
42
0
31 Mar 2024
Uncertainty-Aware Evaluation for Vision-Language Models
Uncertainty-Aware Evaluation for Vision-Language Models
Vasily Kostumov
Bulat Nutfullin
Oleg Pilipenko
Eugene Ilyushin
ELM
29
7
0
22 Feb 2024
Previous
12345
Next