ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.12995
  4. Cited By
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking
  Head

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

25 April 2023
Rongjie Huang
Mingze Li
Dongchao Yang
Jiatong Shi
Xuankai Chang
Zhenhui Ye
Yuning Wu
Zhiqing Hong
Jia-Bin Huang
Jinglin Liu
Yixiang Ren
Zhou Zhao
Shinji Watanabe
    LM&MA
    AuLLM
ArXivPDFHTML

Papers citing "AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head"

45 / 145 papers shown
Title
Equipping Pretrained Unconditional Music Transformers with Instrument
  and Genre Controls
Equipping Pretrained Unconditional Music Transformers with Instrument and Genre Controls
Weihan Xu
Julian McAuley
Shlomo Dubnov
Hao-Wen Dong
24
1
0
21 Nov 2023
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
28
267
0
14 Nov 2023
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Zhen Yang
Yingxue Zhang
Fandong Meng
Jie Zhou
VLM
MLLM
37
3
0
08 Nov 2023
Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in
  Open Worlds
Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds
Sipeng Zheng
Jiazheng Liu
Yicheng Feng
Zongqing Lu
40
29
0
20 Oct 2023
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative
  Editing
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing
Yixiao Zhang
Akira Maezawa
Gus Xia
Kazuhiko Yamamoto
Simon Dixon
44
17
0
19 Oct 2023
MusicAgent: An AI Agent for Music Understanding and Generation with
  Large Language Models
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
Dingyao Yu
Kaitao Song
Peiling Lu
Tianyu He
Xu Tan
Wei Ye
Shikun Zhang
Jiang Bian
LLMAG
19
16
0
18 Oct 2023
SALM: Speech-augmented Language Model with In-context Learning for
  Speech Recognition and Translation
SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
Zhehuai Chen
He Huang
A. Andrusenko
Oleksii Hrinchuk
Krishna C. Puvvada
Jason Chun Lok Li
Subhankar Ghosh
Jagadeesh Balam
Boris Ginsburg
LRM
21
48
0
13 Oct 2023
Jigsaw: Supporting Designers to Prototype Multimodal Applications by
  Chaining AI Foundation Models
Jigsaw: Supporting Designers to Prototype Multimodal Applications by Chaining AI Foundation Models
David Chuan-En Lin
Nikolas Martelaro
24
18
0
12 Oct 2023
Towards Robust Multi-Modal Reasoning via Model Selection
Towards Robust Multi-Modal Reasoning via Model Selection
Xiangyan Liu
Rongxue Li
Wei Ji
Tao Lin
LLMAG
LRM
27
3
0
12 Oct 2023
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker
  Extraction
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction
Xiang Hao
Jibin Wu
Jianwei Yu
Chenglin Xu
Kay Chen Tan
24
10
0
11 Oct 2023
GraphLLM: Boosting Graph Reasoning Ability of Large Language Model
GraphLLM: Boosting Graph Reasoning Ability of Large Language Model
Ziwei Chai
Tianjie Zhang
Liang Wu
Kaiqiao Han
Xiaohai Hu
Xuanwen Huang
Yang Yang
AI4MH
LRM
19
56
0
09 Oct 2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Zhihao Du
Jiaming Wang
Qian Chen
Yunfei Chu
Zhifu Gao
...
Wen Wang
Siqi Zheng
Chang Zhou
Zhijie Yan
Shiliang Zhang
LLMAG
VLM
AuLLM
LM&MA
31
79
0
07 Oct 2023
uTalk: Bridging the Gap Between Humans and AI
uTalk: Bridging the Gap Between Humans and AI
Hussam Azzuni
Sharim Jamal
Abdulmotaleb Elsaddik
14
6
0
04 Oct 2023
Towards human-like spoken dialogue generation between AI agents from
  written dialogue
Towards human-like spoken dialogue generation between AI agents from written dialogue
Kentaro Mitsui
Yukiya Hono
Kei Sawada
29
13
0
02 Oct 2023
Joint Audio and Speech Understanding
Joint Audio and Speech Understanding
Yuan Gong
Alexander H. Liu
Hongyin Luo
Leonid Karlinsky
James R. Glass
AuLLM
26
66
0
25 Sep 2023
Connecting Speech Encoder and Large Language Model for ASR
Connecting Speech Encoder and Large Language Model for ASR
Wenyi Yu
Changli Tang
Guangzhi Sun
Xianzhao Chen
T. Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
AuLLM
6
64
0
25 Sep 2023
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language
  Feedback
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang
Zihan Wang
Jiateng Liu
Yangyi Chen
Lifan Yuan
Hao Peng
Heng Ji
LRM
125
139
0
19 Sep 2023
Instruction-Following Speech Recognition
Instruction-Following Speech Recognition
Cheng-I Jeff Lai
Zhiyun Lu
Liangliang Cao
Ruoming Pang
AuLLM
19
6
0
18 Sep 2023
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive
  Instruction-Tuning Benchmark for Speech
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
Chien-yu Huang
Ke-Han Lu
Shi Wang
Chi-Yuan Hsiao
Chun-Yi Kuan
...
Roshan S. Sharma
Shinji Watanabe
Bhiksha Ramakrishnan
Shady Shehata
Hung-yi Lee
AuLLM
32
50
0
18 Sep 2023
NExT-GPT: Any-to-Any Multimodal LLM
NExT-GPT: Any-to-Any Multimodal LLM
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
MLLM
46
449
0
11 Sep 2023
Leveraging Large Language Models for Exploiting ASR Uncertainty
Leveraging Large Language Models for Exploiting ASR Uncertainty
Pranay Dighe
Yi Su
Shangshang Zheng
Yunshu Liu
Vineet Garg
Xiaochuan Niu
Ahmed H. Tewfik
13
12
0
09 Sep 2023
BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment
  of Continuation Writing
BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing
Chen Wang
Minpeng Liao
Zhongqiang Huang
Jinliang Lu
Junhong Wu
Yuchen Liu
Chengqing Zong
Jiajun Zhang
AuLLM
28
35
0
02 Sep 2023
PointLLM: Empowering Large Language Models to Understand Point Clouds
PointLLM: Empowering Large Language Models to Understand Point Clouds
Runsen Xu
Xiaolong Wang
Tai Wang
Yilun Chen
Jiangmiao Pang
Dahua Lin
MLLM
51
148
0
31 Aug 2023
LLaSM: Large Language and Speech Model
LLaSM: Large Language and Speech Model
Yu Shu
Siwei Dong
Guangyao Chen
Wen-Fen Huang
Ruihua Zhang
Daochen Shi
Qiqi Xiang
Yemin Shi
AuLLM
25
46
0
30 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Erik Cambria
Björn W. Schuller
LM&MA
AuLLM
29
36
0
24 Aug 2023
StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
Emanuele Bugliarello
Hernan Moraldo
Ruben Villegas
Mohammad Babaeizadeh
M. Saffar
Han Zhang
D. Erhan
V. Ferrari
Pieter-Jan Kindermans
P. Voigtlaender
VGen
28
10
0
22 Aug 2023
Refashioning Emotion Recognition Modelling: The Advent of Generalised
  Large Models
Refashioning Emotion Recognition Modelling: The Advent of Generalised Large Models
Zixing Zhang
Liyizhe Peng
Tao Pang
Jing Han
Huan Zhao
Bjorn W. Schuller
32
12
0
21 Aug 2023
LibriSQA: A Novel Dataset and Framework for Spoken Question Answering
  with Large Language Models
LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models
Zihan Zhao
Yiyang Jiang
Heyang Liu
Yanfeng Wang
Yu Wang
23
1
0
20 Aug 2023
WavJourney: Compositional Audio Creation with Large Language Models
WavJourney: Compositional Audio Creation with Large Language Models
Xubo Liu
Zhongkai Zhu
Haohe Liu
Yiitan Yuan
Meng Cui
...
Jinhua Liang
Yin Cao
Qiuqiang Kong
Mark D. Plumbley
Wenwu Wang
AuLLM
21
25
0
26 Jul 2023
TableGPT: Towards Unifying Tables, Nature Language and Commands into One
  GPT
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
Liangyu Zha
Junlin Zhou
Liyao Li
Rui Wang
Qingyi Huang
...
Xing-yan Deng
J. Xu
Haobo Wang
Gang Chen
J. Zhao
RALM
LMTD
32
42
0
17 Jul 2023
Mini-Giants: "Small" Language Models and Open Source Win-Win
Mini-Giants: "Small" Language Models and Open Source Win-Win
Zhengping Zhou
Lezhi Li
Xinxi Chen
Andy Li
SyDa
ALM
MoE
24
6
0
17 Jul 2023
On decoder-only architecture for speech-to-text and large language model
  integration
On decoder-only architecture for speech-to-text and large language model integration
Jian Wu
Yashesh Gaur
Zhuo Chen
Long Zhou
Yilun Zhu
...
Jinyu Li
Shujie Liu
Bo Ren
Linquan Liu
Yu-Huan Wu
AuLLM
22
117
0
08 Jul 2023
What Matters in Training a GPT4-Style Language Model with Multimodal
  Inputs?
What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?
Yan Zeng
Hanbo Zhang
Jiani Zheng
Jiangnan Xia
Guoqiang Wei
Yang Wei
Yuchen Zhang
Tao Kong
MLLM
19
71
0
05 Jul 2023
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
Rongjie Huang
Huadai Liu
Xize Cheng
Yi Ren
Lin Li
...
Jinzheng He
Lichao Zhang
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
67
8
0
24 May 2023
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal
  Conversational Abilities
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
Dong Zhang
Shimin Li
Xin Zhang
Jun Zhan
Pengyu Wang
Yaqian Zhou
Xipeng Qiu
AuLLM
MLLM
43
287
0
18 May 2023
Augmented Large Language Models with Parametric Knowledge Guiding
Augmented Large Language Models with Parametric Knowledge Guiding
Ziyang Luo
Can Xu
Pu Zhao
Xiubo Geng
Chongyang Tao
Jing Ma
Qingwei Lin
Daxin Jiang
KELM
RALM
35
44
0
08 May 2023
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
Ruiqi Li
Rongjie Huang
Lichao Zhang
Jinglin Liu
Zhou Zhao
23
4
0
08 May 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion
  Models
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
140
315
0
30 Jan 2023
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for
  Noise-robust Expressive TTS
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Dongchao Yang
Songxiang Liu
Jianwei Yu
Helin Wang
Chao Weng
Yuexian Zou
DiffM
VLM
29
18
0
04 Nov 2022
TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural
  Speaker Separation
TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation
Zhong-Qiu Wang
Samuele Cornell
Shukjae Choi
Younglo Lee
Byeonghak Kim
Shinji Watanabe
66
96
0
08 Sep 2022
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain
  Text-to-Speech
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech
Rongjie Huang
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
OODD
VLM
115
34
0
15 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
306
11,909
0
04 Mar 2022
Improving the Performance of Automated Audio Captioning via Integrating
  the Acoustic and Semantic Information
Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information
Zhongjie Ye
Helin Wang
Dongchao Yang
Yuexian Zou
32
27
0
12 Oct 2021
Searchable Hidden Intermediates for End-to-End Models of Decomposable
  Sequence Tasks
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks
Siddharth Dalmia
Brian Yan
Vikas Raunak
Florian Metze
Shinji Watanabe
35
30
0
02 May 2021
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,453
0
23 Jan 2020
Previous
123