ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.12995
  4. Cited By
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking
  Head

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

25 April 2023
Rongjie Huang
Mingze Li
Dongchao Yang
Jiatong Shi
Xuankai Chang
Zhenhui Ye
Yuning Wu
Zhiqing Hong
Jia-Bin Huang
Jinglin Liu
Yixiang Ren
Zhou Zhao
Shinji Watanabe
    LM&MA
    AuLLM
ArXivPDFHTML

Papers citing "AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head"

50 / 145 papers shown
Title
A Comprehensive Review of Multimodal Large Language Models: Performance
  and Challenges Across Different Tasks
A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks
Jiaqi Wang
Hanqi Jiang
Yi-Hsueh Liu
Chong Ma
Xu-Yao Zhang
...
Xin Zhang
Wei Zhang
Dinggang Shen
Tianming Liu
Shu Zhang
VLM
AI4TS
42
30
0
02 Aug 2024
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language
  Models
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models
Junda Wu
Xintong Li
Tong Yu
Yu-Xiang Wang
Xiang Chen
Jiuxiang Gu
Lina Yao
Jingbo Shang
Julian McAuley
37
0
0
29 Jul 2024
Audio-visual training for improved grounding in video-text LLMs
Audio-visual training for improved grounding in video-text LLMs
Shivprasad Sagare
Hemachandran S
Kinshuk Sarabhai
Prashant Ullegaddi
SA Rajeshkumar
27
0
0
21 Jul 2024
F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
Jie-jin Yang
Xuesong Niu
Nan Jiang
Ruimao Zhang
Siyuan Huang
30
9
0
17 Jul 2024
Speech-Copilot: Leveraging Large Language Models for Speech Processing
  via Task Decomposition, Modularization, and Program Generation
Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
Chun-Yi Kuan
Chih-Kai Yang
Wei-Ping Huang
Ke-Han Lu
Hung-yi Lee
39
5
0
13 Jul 2024
SoupLM: Model Integration in Large Language and Multi-Modal Models
SoupLM: Model Integration in Large Language and Multi-Modal Models
Yue Bai
Zichen Zhang
Jiasen Lu
Yun Fu
MoMe
22
1
0
11 Jul 2024
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based
  Speech Recognition
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Ye Bai
Jingping Chen
Jitong Chen
Wei Chen
Zhuo Chen
...
Wanyi Zhang
Yang Zhang
Yawei Zhang
Yijie Zheng
Ming Zou
AuLLM
44
19
0
05 Jul 2024
Investigating the Effects of Large-Scale Pseudo-Stereo Data and
  Different Speech Foundation Model on Dialogue Generative Spoken Language
  Model
Investigating the Effects of Large-Scale Pseudo-Stereo Data and Different Speech Foundation Model on Dialogue Generative Spoken Language Model
Yu-Kuan Fu
Cheng-Kuang Lee
Hsiu-Hsuan Wang
Hung-yi Lee
22
0
0
02 Jul 2024
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text
  Alignment
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
He Huang
Boris Ginsburg
Yu-Chiang Frank Wang
Hung-yi Lee
VLM
AuLLM
31
9
0
27 Jun 2024
AudioBench: A Universal Benchmark for Audio Large Language Models
AudioBench: A Universal Benchmark for Audio Large Language Models
Bin Wang
Xunlong Zou
Geyu Lin
S.
Zhuohan Liu
Wenyu Zhang
Zhengyuan Liu
AiTi Aw
Nancy F. Chen
AuLLM
ELM
LM&MA
85
20
0
23 Jun 2024
Transferable speech-to-text large language model alignment module
Transferable speech-to-text large language model alignment module
Boyong Wu
Chao Yan
Haoran Pu
35
0
0
19 Jun 2024
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible
  Acoustic Reception and Reaction
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Haoqiu Yan
Yongxin Zhu
Kai Zheng
Bing Liu
Haoyu Cao
Deqiang Jiang
Linli Xu
AuLLM
29
4
0
18 Jun 2024
LLaNA: Large Language and NeRF Assistant
LLaNA: Large Language and NeRF Assistant
Andrea Amaduzzi
Pierluigi Zama Ramirez
Giuseppe Lisanti
Samuele Salti
Luigi Di Stefano
36
2
0
17 Jun 2024
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and
  Complex Reasoning Abilities
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Sreyan Ghosh
Sonal Kumar
Ashish Seth
Chandra Kiran Reddy Evuru
Utkarsh Tyagi
S. Sakshi
Oriol Nieto
R. Duraiswami
Dinesh Manocha
AuLLM
LRM
46
36
0
17 Jun 2024
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Se Jin Park
Chae Won Kim
Hyeongseop Rha
Minsu Kim
Joanna Hong
Jeong Hun Yeo
Yong Man Ro
CVBM
AuLLM
40
6
0
12 Jun 2024
TRINS: Towards Multimodal Language Models that Can Read
TRINS: Towards Multimodal Language Models that Can Read
Ruiyi Zhang
Yanzhe Zhang
Jian Chen
Yufan Zhou
Jiuxiang Gu
Changyou Chen
Tong Sun
VLM
26
6
0
10 Jun 2024
Source Code Foundation Models are Transferable Binary Analysis Knowledge
  Bases
Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases
Zian Su
Xiangzhe Xu
Ziyang Huang
Kaiyuan Zhang
Xiangyu Zhang
32
5
0
30 May 2024
A Full-duplex Speech Dialogue Scheme Based On Large Language Models
A Full-duplex Speech Dialogue Scheme Based On Large Language Models
Peng Wang
Songshuo Lu
Yaohua Tang
Sijie Yan
Yuanjun Xiong
Wei Xia
AuLLM
26
10
0
29 May 2024
BLSP-KD: Bootstrapping Language-Speech Pre-training via Knowledge
  Distillation
BLSP-KD: Bootstrapping Language-Speech Pre-training via Knowledge Distillation
Chen Wang
Minpeng Liao
Zhongqiang Huang
Jiajun Zhang
ALM
AuLLM
40
4
0
29 May 2024
C3LLM: Conditional Multimodal Content Generation Using Large Language
  Models
C3LLM: Conditional Multimodal Content Generation Using Large Language Models
Zixuan Wang
Qinkai Duan
Yu-Wing Tai
Chi-Keung Tang
38
3
0
25 May 2024
SpeechVerse: A Large-scale Generalizable Audio Language Model
SpeechVerse: A Large-scale Generalizable Audio Language Model
Nilaksh Das
Saket Dingliwal
S. Ronanki
Rohit Paturi
David Huang
...
Monica Sunkara
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
Katrin Kirchhoff
39
37
0
14 May 2024
Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Xuelong Geng
Tianyi Xu
Kun Wei
Bingshen Mu
Hongfei Xue
...
Pengcheng Guo
Yuhang Dai
Longhao Li
Mingchen Shao
Lei Xie
36
9
0
03 May 2024
Audio Dialogues: Dialogues dataset for audio and music understanding
Audio Dialogues: Dialogues dataset for audio and music understanding
Arushi Goel
Zhifeng Kong
Rafael Valle
Bryan Catanzaro
AuLLM
29
4
0
11 Apr 2024
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering
  Using a VLM
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
Wonkyun Kim
Changin Choi
Wonseok Lee
Wonjong Rhee
VLM
43
50
0
27 Mar 2024
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
Yongqi Wang
Ruofan Hu
Rongjie Huang
Zhiqing Hong
Ruiqi Li
Wenrui Liu
Fuming You
Tao Jin
Zhou Zhao
38
11
0
18 Mar 2024
GraphInstruct: Empowering Large Language Models with Graph Understanding
  and Reasoning Capability
GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability
Zihan Luo
Xiran Song
Hong Huang
Jianxun Lian
Chenhao Zhang
Jinqi Jiang
Xing Xie
LRM
29
30
0
07 Mar 2024
AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D
  Talking Face Generation
AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation
Yasheng Sun
Wenqing Chu
Hang Zhou
Kaisiyuan Wang
Hideki Koike
32
5
0
25 Feb 2024
Budget-Constrained Tool Learning with Planning
Budget-Constrained Tool Learning with Planning
Yuanhang Zheng
Peng Li
Mingshi Yan
Ji Zhang
Fei Huang
Yang Janet Liu
32
3
0
25 Feb 2024
Large Multimodal Agents: A Survey
Large Multimodal Agents: A Survey
Junlin Xie
Zhihong Chen
Ruifei Zhang
Xiang Wan
Guanbin Li
LM&Ro
LLMAG
37
38
0
23 Feb 2024
LLMBind: A Unified Modality-Task Integration Framework
LLMBind: A Unified Modality-Task Integration Framework
Bin Zhu
Munan Ning
Peng Jin
Bin Lin
Jinfa Huang
...
Junwu Zhang
Zhenyu Tang
Mingjun Pan
Xing Zhou
Li-ming Yuan
MLLM
32
6
0
22 Feb 2024
Advancing Large Language Models to Capture Varied Speaking Styles and
  Respond Properly in Spoken Conversations
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations
Guan-Ting Lin
Cheng-Han Chiang
Hung-yi Lee
34
22
0
20 Feb 2024
AIR-Bench: Benchmarking Large Audio-Language Models via Generative
  Comprehension
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
Qian Yang
Jin Xu
Wenrui Liu
Yunfei Chu
Ziyue Jiang
...
Yichong Leng
Yuanjun Lv
Zhou Zhao
Chang Zhou
Jingren Zhou
LM&MA
AuLLM
ALM
44
57
0
12 Feb 2024
Professional Agents -- Evolving Large Language Models into Autonomous
  Experts with Human-Level Competencies
Professional Agents -- Evolving Large Language Models into Autonomous Experts with Human-Level Competencies
Zhixuan Chu
Yan Wang
Feng Zhu
Lu Yu
Longfei Li
Jinjie Gu
LLMAG
16
8
0
06 Feb 2024
User Intent Recognition and Satisfaction with Large Language Models: A
  User Study with ChatGPT
User Intent Recognition and Satisfaction with Large Language Models: A User Study with ChatGPT
Anna Bodonhelyi
Efe Bozkir
Shuo Yang
Enkelejda Kasneci
Gjergji Kasneci
ELM
AI4MH
28
16
0
03 Feb 2024
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and
  Dialogue Abilities
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Zhifeng Kong
Arushi Goel
Rohan Badlani
Wei Ping
Rafael Valle
Bryan Catanzaro
AuLLM
LM&MA
MLLM
59
73
0
02 Feb 2024
BAT: Learning to Reason about Spatial Sounds with Large Language Models
BAT: Learning to Reason about Spatial Sounds with Large Language Models
Zhisheng Zheng
Puyuan Peng
Ziyang Ma
Xie Chen
Eunsol Choi
David F. Harwath
LRM
35
14
0
02 Feb 2024
Can MLLMs Perform Text-to-Image In-Context Learning?
Can MLLMs Perform Text-to-Image In-Context Learning?
Yuchen Zeng
Wonjun Kang
Yicong Chen
Hyung Il Koo
Kangwook Lee
MLLM
28
9
0
02 Feb 2024
Hidding the Ghostwriters: An Adversarial Evaluation of AI-Generated
  Student Essay Detection
Hidding the Ghostwriters: An Adversarial Evaluation of AI-Generated Student Essay Detection
Xinlin Peng
Ying Zhou
Ben He
Le Sun
Yingfei Sun
DeLMO
18
11
0
01 Feb 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
37
175
0
24 Jan 2024
Detecting Multimedia Generated by Large AI Models: A Survey
Detecting Multimedia Generated by Large AI Models: A Survey
Li Lin
Neeraj Gupta
Yue Zhang
Hainan Ren
Chun-Hao Liu
Feng Ding
Xin Eric Wang
X. Li
Luisa Verdoliva
Shu Hu
75
56
0
22 Jan 2024
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
Zhenhui Ye
Tianyun Zhong
Yi Ren
Jiaqi Yang
Weichuang Li
...
Jinglin Liu
Chen Zhang
Xiang Yin
Zejun Ma
Zhou Zhao
24
44
0
16 Jan 2024
Pheme: Efficient and Conversational Speech Generation
Pheme: Efficient and Conversational Speech Generation
Paweł Budzianowski
Taras Sereda
Tomasz Cichy
Ivan Vulić
21
7
0
05 Jan 2024
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision,
  Language, Audio, and Action
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
32
144
0
28 Dec 2023
Towards Message Brokers for Generative AI: Survey, Challenges, and
  Opportunities
Towards Message Brokers for Generative AI: Survey, Challenges, and Opportunities
Alaa Saleh
Roberto Morabito
Sasu Tarkoma
Susanna Pirttikangas
Lauri Lovén
58
3
0
22 Dec 2023
SECap: Speech Emotion Captioning with Large Language Model
SECap: Speech Emotion Captioning with Large Language Model
Yaoxun Xu
Hangting Chen
Jianwei Yu
Qiaochu Huang
Zhiyong Wu
Shixiong Zhang
Guangzhi Li
Yi Luo
Rongzhi Gu
20
22
0
16 Dec 2023
TAP4LLM: Table Provider on Sampling, Augmenting, and Packing
  Semi-structured Data for Large Language Model Reasoning
TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning
Yuan Sui
Jiaru Zou
Mengyu Zhou
Xinyi He
Lun Du
Shi Han
Dongmei Zhang
LRM
LMTD
16
23
0
14 Dec 2023
HeadArtist: Text-conditioned 3D Head Generation with Self Score
  Distillation
HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation
Hongyu Liu
Xuan Wang
Ziyu Wan
Yujun Shen
Yibing Song
Jing Liao
Qifeng Chen
DiffM
36
17
0
12 Dec 2023
Audio-Visual LLM for Video Understanding
Audio-Visual LLM for Video Understanding
Fangxun Shu
Lei Zhang
Hao Jiang
Cihang Xie
VLM
MLLM
19
37
0
11 Dec 2023
GPT4Point: A Unified Framework for Point-Language Understanding and
  Generation
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
Zhangyang Qi
Ye Fang
Zeyi Sun
Xiaoyang Wu
Tong Wu
Jiaqi Wang
Dahua Lin
Hengshuang Zhao
MLLM
71
35
0
05 Dec 2023
ChatPose: Chatting about 3D Human Pose
ChatPose: Chatting about 3D Human Pose
Yao Feng
Jing Lin
Sai Kumar Dwivedi
Yu Sun
Priyanka Patel
Michael J. Black
3DH
26
38
0
30 Nov 2023
Previous
123
Next