Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.11834
Cited By
Pengi: An Audio Language Model for Audio Tasks
19 May 2023
Soham Deshmukh
Benjamin Elizalde
Rita Singh
Huaming Wang
MLLM
AuLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Pengi: An Audio Language Model for Audio Tasks"
20 / 120 papers shown
Title
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
Sreyan Ghosh
Ashish Seth
Sonal Kumar
Utkarsh Tyagi
Chandra Kiran Reddy Evuru
S. Ramaneswaran
S. Sakshi
Oriol Nieto
R. Duraiswami
Dinesh Manocha
AuLLM
VLM
CoGe
30
21
0
12 Oct 2023
LLark: A Multimodal Instruction-Following Language Model for Music
Josh Gardner
Simon Durand
Daniel Stoller
Rachel M. Bittner
AuLLM
15
14
0
11 Oct 2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Zhihao Du
Jiaming Wang
Qian Chen
Yunfei Chu
Zhifu Gao
...
Wen Wang
Siqi Zheng
Chang Zhou
Zhijie Yan
Shiliang Zhang
LLMAG
VLM
AuLLM
LM&MA
20
46
0
07 Oct 2023
LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model
Muhammad Ahmed Shah
Roshan S. Sharma
Hira Dhamyal
R. Olivier
Ankit Shah
...
Massa Baali
Soham Deshmukh
Michael Kuhlmann
Bhiksha Raj
Rita Singh
AAML
17
19
0
02 Oct 2023
SLM: Bridge the thin gap between speech and text foundation models
Mingqiu Wang
Wei Han
Izhak Shafran
Zelin Wu
Chung-Cheng Chiu
...
Zhong Meng
Golan Pundak
Nikhil Siddhartha
J. Schalkwyk
Yonghui Wu
AuLLM
37
56
0
30 Sep 2023
Instruction-Following Speech Recognition
Cheng-I Jeff Lai
Zhiyun Lu
Liangliang Cao
Ruoming Pang
AuLLM
16
6
0
18 Sep 2023
Training Audio Captioning Models without Audio
Soham Deshmukh
Benjamin Elizalde
Dimitra Emmanouilidou
Bhiksha Raj
Rita Singh
Huaming Wang
11
9
0
14 Sep 2023
Natural Language Supervision for General-Purpose Audio Representations
Benjamin Elizalde
Soham Deshmukh
Huaming Wang
AuLLM
AI4TS
11
53
0
11 Sep 2023
Leveraging Large Language Models for Exploiting ASR Uncertainty
Pranay Dighe
Yi Su
Shangshang Zheng
Yunshu Liu
Vineet Garg
Xiaochuan Niu
Ahmed H. Tewfik
13
12
0
09 Sep 2023
Zero-Shot Audio Captioning via Audibility Guidance
Tal Shaharabany
Ariel Shaulov
Lior Wolf
13
4
0
07 Sep 2023
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Erik Cambria
Björn W. Schuller
LM&MA
AuLLM
24
36
0
24 Aug 2023
LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models
Zihan Zhao
Yiyang Jiang
Heyang Liu
Yanfeng Wang
Yu Wang
15
1
0
20 Aug 2023
On decoder-only architecture for speech-to-text and large language model integration
Jian Wu
Yashesh Gaur
Zhuo Chen
Long Zhou
Yilun Zhu
...
Jinyu Li
Shujie Liu
Bo Ren
Linquan Liu
Yu-Huan Wu
AuLLM
19
115
0
08 Jul 2023
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Bill Xu
Enhong Chen
MLLM
LRM
33
515
0
23 Jun 2023
Listen, Think, and Understand
Yuan Gong
Hongyin Luo
Alexander H. Liu
Leonid Karlinsky
James R. Glass
ELM
MLLM
LRM
26
133
0
18 May 2023
Audio Retrieval with WavText5K and CLAP Training
Soham Deshmukh
Benjamin Elizalde
Huaming Wang
3DV
CLIP
107
50
0
28 Sep 2022
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Ke Chen
Xingjian Du
Bilei Zhu
Zejun Ma
Taylor Berg-Kirkpatrick
Shlomo Dubnov
ViT
111
262
0
02 Feb 2022
NaRLE: Natural Language Models using Reinforcement Learning with Emotion Feedback
Ruijie Zhou
Soham Deshmukh
Jeremiah Greer
Charles Lee
8
7
0
05 Oct 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
278
3,784
0
18 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
2,875
0
11 Feb 2021
Previous
1
2
3