Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.11000
Cited By
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
18 May 2023
Dong Zhang
Shimin Li
Xin Zhang
Jun Zhan
Pengyu Wang
Yaqian Zhou
Xipeng Qiu
AuLLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities"
23 / 223 papers shown
Title
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Soumi Maiti
Yifan Peng
Shukjae Choi
Jee-weon Jung
Xuankai Chang
Shinji Watanabe
VLM
AuLLM
8
54
0
14 Sep 2023
NExT-GPT: Any-to-Any Multimodal LLM
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
MLLM
12
448
0
11 Sep 2023
Leveraging Large Language Models for Exploiting ASR Uncertainty
Pranay Dighe
Yi Su
Shangshang Zheng
Yunshu Liu
Vineet Garg
Xiaochuan Niu
Ahmed H. Tewfik
11
12
0
09 Sep 2023
BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing
Chen Wang
Minpeng Liao
Zhongqiang Huang
Jinliang Lu
Junhong Wu
Yuchen Liu
Chengqing Zong
Jiajun Zhang
AuLLM
20
11
0
02 Sep 2023
RepCodec: A Speech Representation Codec for Speech Tokenization
Zhichao Huang
Chutong Meng
Tom Ko
4
22
0
31 Aug 2023
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
Xin Zhang
Dong Zhang
Shimin Li
Yaqian Zhou
Xipeng Qiu
17
61
0
31 Aug 2023
Enhancing Subtask Performance of Multi-modal Large Language Model
Yongqiang Zhao
Zhenyu Li
Feng Zhang
Xinhai Xu
Donghong Liu
LRM
11
0
0
31 Aug 2023
LLaSM: Large Language and Speech Model
Yu Shu
Siwei Dong
Guangyao Chen
Wen-Fen Huang
Ruihua Zhang
Daochen Shi
Qiqi Xiang
Yemin Shi
AuLLM
17
46
0
30 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Erik Cambria
Björn W. Schuller
LM&MA
AuLLM
14
36
0
24 Aug 2023
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Jiasheng Ye
Zaixiang Zheng
Yu Bao
Lihua Qian
Quanquan Gu
DiffM
44
14
0
23 Aug 2023
LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models
Zihan Zhao
Yiyang Jiang
Heyang Liu
Yanfeng Wang
Yu Wang
13
1
0
20 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
8
218
0
10 Aug 2023
SVIT: Scaling up Visual Instruction Tuning
Bo-Lu Zhao
Boya Wu
Muyang He
Tiejun Huang
MLLM
20
120
0
09 Jul 2023
On decoder-only architecture for speech-to-text and large language model integration
Jian Wu
Yashesh Gaur
Zhuo Chen
Long Zhou
Yilun Zhu
...
Jinyu Li
Shujie Liu
Bo Ren
Linquan Liu
Yu-Huan Wu
AuLLM
14
115
0
08 Jul 2023
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Yanzhe Zhang
Ruiyi Zhang
Jiuxiang Gu
Yufan Zhou
Nedim Lipka
Diyi Yang
Tongfei Sun
VLM
MLLM
15
217
0
29 Jun 2023
Large Multimodal Models: Notes on CVPR 2023 Tutorial
Chunyuan Li
MLLM
VLM
6
20
0
26 Jun 2023
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Bill Xu
Enhong Chen
MLLM
LRM
16
515
0
23 Jun 2023
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Hang Zhang
Xin Li
Lidong Bing
MLLM
6
944
0
05 Jun 2023
VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Tianrui Wang
Long Zhou
Zi-Hua Zhang
Yu-Huan Wu
Shujie Liu
Yashesh Gaur
Zhuo Chen
Jinyu Li
Furu Wei
24
100
0
25 May 2023
PandaGPT: One Model To Instruction-Follow Them All
Yixuan Su
Tian Lan
Huayang Li
Jialu Xu
Yan Wang
Deng Cai
MLLM
29
269
0
25 May 2023
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
Eliya Nachmani
Alon Levkovitch
Roy Hirsch
Julián Salazar
Chulayutsh Asawaroengchai
Soroosh Mariooryad
Ehud Rivlin
RJ Skerry-Ryan
Michelle Tadmor Ramanovich
AuLLM
13
30
0
24 May 2023
Listen, Think, and Understand
Yuan Gong
Hongyin Luo
Alexander H. Liu
Leonid Karlinsky
James R. Glass
ELM
MLLM
LRM
21
133
0
18 May 2023
Generative Spoken Language Modeling from Raw Audio
Kushal Lakhotia
Evgeny Kharitonov
Wei-Ning Hsu
Yossi Adi
Adam Polyak
...
Tu Nguyen
Jade Copet
Alexei Baevski
A. Mohamed
Emmanuel Dupoux
AuLLM
172
336
0
01 Feb 2021
Previous
1
2
3
4
5