ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.11000
  4. Cited By
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal
  Conversational Abilities

SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities

18 May 2023
Dong Zhang
Shimin Li
Xin Zhang
Jun Zhan
Pengyu Wang
Yaqian Zhou
Xipeng Qiu
    AuLLM
    MLLM
ArXivPDFHTML

Papers citing "SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities"

50 / 223 papers shown
Title
Advancing Large Language Models to Capture Varied Speaking Styles and
  Respond Properly in Spoken Conversations
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations
Guan-Ting Lin
Cheng-Han Chiang
Hung-yi Lee
16
22
0
20 Feb 2024
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Jun Zhan
Junqi Dai
Jiasheng Ye
Yunhua Zhou
Dong Zhang
...
Jie Fu
Tao Gui
Tianxiang Sun
Yugang Jiang
Xipeng Qiu
MLLM
19
114
0
19 Feb 2024
Speech Translation with Speech Foundation Models and Large Language
  Models: What is There and What is Missing?
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
28
11
0
19 Feb 2024
CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI
  Automation
CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI Automation
Xinbei Ma
Zhuosheng Zhang
Hai Zhao
LLMAG
25
21
0
19 Feb 2024
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
Ziyang Ma
Guanrou Yang
Yifan Yang
Zhifu Gao
Jiaming Wang
...
Fan Yu
Qian Chen
Siqi Zheng
Shiliang Zhang
Xie Chen
AuLLM
42
37
0
13 Feb 2024
AIR-Bench: Benchmarking Large Audio-Language Models via Generative
  Comprehension
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
Qian Yang
Jin Xu
Wenrui Liu
Yunfei Chu
Ziyue Jiang
...
Yichong Leng
Yuanjun Lv
Zhou Zhao
Chang Zhou
Jingren Zhou
LM&MA
AuLLM
ALM
41
56
0
12 Feb 2024
SpiRit-LM: Interleaved Spoken and Written Language Model
SpiRit-LM: Interleaved Spoken and Written Language Model
Tu Nguyen
Benjamin Muller
Bokai Yu
Marta R. Costa-jussá
Maha Elbayad
...
Itai Gat
Gabriel Synnaeve
Juan Pino
Benoît Sagot
Emmanuel Dupoux
AuLLM
VLM
36
32
0
08 Feb 2024
It's Never Too Late: Fusing Acoustic Information into Large Language
  Models for Automatic Speech Recognition
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Chen Chen
Ruizhe Li
Yuchen Hu
Sabato Marco Siniscalchi
Pin-Yu Chen
Ensiong Chng
Chao-Han Huck Yang
21
19
0
08 Feb 2024
Can MLLMs Perform Text-to-Image In-Context Learning?
Can MLLMs Perform Text-to-Image In-Context Learning?
Yuchen Zeng
Wonjun Kang
Yicong Chen
Hyung Il Koo
Kangwook Lee
MLLM
15
9
0
02 Feb 2024
Large Language Models for Time Series: A Survey
Large Language Models for Time Series: A Survey
Xiyuan Zhang
Ranak Roy Chowdhury
Rajesh K. Gupta
Jingbo Shang
AI4TS
68
53
0
02 Feb 2024
Image Anything: Towards Reasoning-coherent and Training-free Multi-modal
  Image Generation
Image Anything: Towards Reasoning-coherent and Training-free Multi-modal Image Generation
Yuanhuiyi Lyu
Xueye Zheng
Lin Wang
DiffM
12
9
0
31 Jan 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
37
173
0
24 Jan 2024
SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation
SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation
Dong Zhang
Xin Zhang
Jun Zhan
Shimin Li
Yaqian Zhou
Xipeng Qiu
AuLLM
BDL
35
2
0
24 Jan 2024
BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
Zhen Xiang
Fengqing Jiang
Zidi Xiong
Bhaskar Ramasubramanian
Radha Poovendran
Bo Li
LRM
SILM
24
15
0
20 Jan 2024
GroundingGPT:Language Enhanced Multi-modal Grounding Model
GroundingGPT:Language Enhanced Multi-modal Grounding Model
Zhaowei Li
Qi Xu
Dong Zhang
Hang Song
Yiqing Cai
...
Junting Pan
Zefeng Li
Van Tu Vu
Zhida Huang
Tao Wang
18
36
0
11 Jan 2024
SpeechAgents: Human-Communication Simulation with Multi-Modal
  Multi-Agent Systems
SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems
Dong Zhang
Zhaowei Li
Pengyu Wang
Xin Zhang
Yaqian Zhou
Xipeng Qiu
LLMAG
22
2
0
08 Jan 2024
The Art of Deception: Robust Backdoor Attack using Dynamic Stacking of
  Triggers
The Art of Deception: Robust Backdoor Attack using Dynamic Stacking of Triggers
Orson Mengara
AAML
28
3
0
03 Jan 2024
E-chat: Emotion-sensitive Spoken Dialogue System with Large Language
  Models
E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models
Hongfei Xue
Yuhao Liang
Bingshen Mu
Shiliang Zhang
Mengzhe Chen
Qian Chen
Lei Xie
AuLLM
13
9
0
31 Dec 2023
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Hong-ping Hao
Long Zhou
Shujie Liu
Jinyu Li
Shujie Hu
Rui Wang
Furu Wei
21
18
0
30 Dec 2023
Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue
Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue
Guan-Ting Lin
Prashanth Gurunath Shivakumar
Ankur Gandhe
Chao-Han Huck Yang
Yile Gu
Shalini Ghosh
A. Stolcke
Hung-yi Lee
I. Bulyko
8
12
0
23 Dec 2023
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head
  Translation
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation
Xize Cheng
Rongjie Huang
Linjun Li
Tao Jin
Zehan Wang
Aoxiong Yin
Minglei Li
Xinyu Duan
Changpeng Yang
Zhou Zhao
20
2
0
23 Dec 2023
Speech Translation with Large Language Models: An Industrial Practice
Speech Translation with Large Language Models: An Industrial Practice
Zhichao Huang
Rong Ye
Tom Ko
Qianqian Dong
Shanbo Cheng
Mingxuan Wang
Hang Li
42
15
0
21 Dec 2023
Toward General-Purpose Robots via Foundation Models: A Survey and
  Meta-Analysis
Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis
Yafei Hu
Quanting Xie
Vidhi Jain
Jonathan M Francis
Jay Patrikar
...
Xiaolong Wang
Sebastian A. Scherer
Z. Kira
Fei Xia
Yonatan Bisk
LM&Ro
AI4CE
19
54
0
14 Dec 2023
Assessing GPT4-V on Structured Reasoning Tasks
Assessing GPT4-V on Structured Reasoning Tasks
Mukul Singh
J. Cambronero
Sumit Gulwani
Vu Le
Gust Verbruggen
LRM
27
10
0
13 Dec 2023
GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion
  Recognition
GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion Recognition
Zheng Lian
Licai Sun
Haiyang Sun
Kang Chen
Zhuofan Wen
Hao Gu
Bin Liu
Jianhua Tao
15
27
0
07 Dec 2023
Integrating Pre-Trained Speech and Language Models for End-to-End Speech
  Recognition
Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition
Yukiya Hono
Koh Mitsuda
Tianyu Zhao
Kentaro Mitsui
Toshiaki Wakatsuki
Kei Sawada
AuLLM
21
8
0
06 Dec 2023
ChatPose: Chatting about 3D Human Pose
ChatPose: Chatting about 3D Human Pose
Yao Feng
Jing Lin
Sai Kumar Dwivedi
Yu Sun
Priyanka Patel
Michael J. Black
3DH
18
34
0
30 Nov 2023
ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for
  Improving ASR Robustness in Spoken Language Understanding
ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding
Xuxin Cheng
Bowen Cao
Qichen Ye
Zhihong Zhu
Hongxiang Li
Yuexian Zou
10
25
0
19 Nov 2023
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
11
263
0
14 Nov 2023
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Zhen Yang
Yingxue Zhang
Fandong Meng
Jie Zhou
VLM
MLLM
29
1
0
08 Nov 2023
Loss Masking Is Not Needed in Decoder-only Transformer for
  Discrete-token-based ASR
Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token-based ASR
Qian Chen
Wen Wang
Qinglin Zhang
Siqi Zheng
Shiliang Zhang
Chong Deng
Yukun Ma
Hai Yu
Jiaqing Liu
Chong Zhang
8
5
0
08 Nov 2023
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
Jing Pan
Jian Wu
Yashesh Gaur
S. Sivasankaran
Zhuo Chen
Shujie Liu
Jinyu Li
ELM
16
25
0
03 Nov 2023
Enhancing the Spatial Awareness Capability of Multi-Modal Large Language
  Model
Enhancing the Spatial Awareness Capability of Multi-Modal Large Language Model
Yongqiang Zhao
Zhenyu Li
Zhi Jin
Feng Zhang
Haiyan Zhao
Chengfeng Dou
Zhengwei Tao
Xinhai Xu
Donghong Liu
11
4
0
31 Oct 2023
SALMONN: Towards Generic Hearing Abilities for Large Language Models
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
LM&MA
AuLLM
28
195
0
20 Oct 2023
SALM: Speech-augmented Language Model with In-context Learning for
  Speech Recognition and Translation
SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
Zhehuai Chen
He Huang
A. Andrusenko
Oleksii Hrinchuk
Krishna C. Puvvada
Jason Chun Lok Li
Subhankar Ghosh
Jagadeesh Balam
Boris Ginsburg
LRM
10
48
0
13 Oct 2023
SeqXGPT: Sentence-Level AI-Generated Text Detection
SeqXGPT: Sentence-Level AI-Generated Text Detection
Pengyu Wang
Linyang Li
Ke Ren
Botian Jiang
Dong Zhang
Xipeng Qiu
DeLMO
13
48
0
13 Oct 2023
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker
  Extraction
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction
Xiang Hao
Jibin Wu
Jianwei Yu
Chenglin Xu
Kay Chen Tan
11
10
0
11 Oct 2023
Fine-grained Audio-Visual Joint Representations for Multimodal Large
  Language Models
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models
Guangzhi Sun
Wenyi Yu
Changli Tang
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
18
12
0
09 Oct 2023
Improving End-to-End Speech Processing by Efficient Text Data
  Utilization with Latent Synthesis
Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Jianqiao Lu
Wenyong Huang
Nianzu Zheng
Xingshan Zeng
Y. Yeung
Xiao Chen
SyDa
17
1
0
09 Oct 2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Zhihao Du
Jiaming Wang
Qian Chen
Yunfei Chu
Zhifu Gao
...
Wen Wang
Siqi Zheng
Chang Zhou
Zhijie Yan
Shiliang Zhang
LLMAG
VLM
AuLLM
LM&MA
18
46
0
07 Oct 2023
uTalk: Bridging the Gap Between Humans and AI
uTalk: Bridging the Gap Between Humans and AI
Hussam Azzuni
Sharim Jamal
Abdulmotaleb Elsaddik
9
6
0
04 Oct 2023
Tuning Large language model for End-to-end Speech Translation
Tuning Large language model for End-to-end Speech Translation
Hao Zhang
Nianwen Si
Yaqi Chen
Wenlin Zhang
Xu Yang
Dan Qu
Xiaolin Jiao
13
8
0
03 Oct 2023
Towards human-like spoken dialogue generation between AI agents from
  written dialogue
Towards human-like spoken dialogue generation between AI agents from written dialogue
Kentaro Mitsui
Yukiya Hono
Kei Sawada
16
13
0
02 Oct 2023
SLM: Bridge the thin gap between speech and text foundation models
SLM: Bridge the thin gap between speech and text foundation models
Mingqiu Wang
Wei Han
Izhak Shafran
Zelin Wu
Chung-Cheng Chiu
...
Zhong Meng
Golan Pundak
Nikhil Siddhartha
J. Schalkwyk
Yonghui Wu
AuLLM
37
56
0
30 Sep 2023
Joint Audio and Speech Understanding
Joint Audio and Speech Understanding
Yuan Gong
Alexander H. Liu
Hongyin Luo
Leonid Karlinsky
James R. Glass
AuLLM
8
65
0
25 Sep 2023
Connecting Speech Encoder and Large Language Model for ASR
Connecting Speech Encoder and Large Language Model for ASR
Wenyi Yu
Changli Tang
Guangzhi Sun
Xianzhao Chen
T. Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
AuLLM
6
64
0
25 Sep 2023
Towards Joint Modeling of Dialogue Response and Speech Synthesis based
  on Large Language Model
Towards Joint Modeling of Dialogue Response and Speech Synthesis based on Large Language Model
Xinyu Zhou
Delong Chen
Yudong Chen
AuLLM
16
0
0
20 Sep 2023
Instruction-Following Speech Recognition
Instruction-Following Speech Recognition
Cheng-I Jeff Lai
Zhiyun Lu
Liangliang Cao
Ruoming Pang
AuLLM
11
6
0
18 Sep 2023
Talk2Care: Facilitating Asynchronous Patient-Provider Communication with
  Large-Language-Model
Talk2Care: Facilitating Asynchronous Patient-Provider Communication with Large-Language-Model
Ziqi Yang
Xuhai Xu
Bingsheng Yao
Shao Zhang
Ethan Rogers
Stephen Intille
N. Shara
G. Gao
Dakuo Wang
LM&MA
AI4MH
9
4
0
17 Sep 2023
Decoder-only Architecture for Speech Recognition with CTC Prompts and
  Text Data Augmentation
Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation
E. Tsunoo
Hayato Futami
Yosuke Kashiwagi
Siddhant Arora
Shinji Watanabe
VLM
AuLLM
RALM
25
9
0
16 Sep 2023
Previous
12345
Next