v1v2 (latest)

Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

6 May 2022

Yuan Gong

Jingbo Yu

James R. Glass

ArXiv (abs)PDF HTML Github (163★)

Papers citing "Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition"

47 / 47 papers shown

MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation

210

02 Dec 2025

DHAuDS: A Dynamic and Heterogeneous Audio Benchmark for Test-Time Adaptation

Weichuang Shao

I. Liao

Tomas Henrique Bode Maul

T. Chandesa

TTA

274

23 Nov 2025

LongCat-Flash-Omni Technical Report

...

665

31 Oct 2025

AMAuT: A Flexible and Efficient Multiview Audio Transformer Framework Trained from Scratch

Weichuang Shao

I. Liao

Tomas Henrique Bode Maul

T. Chandesa

175

22 Oct 2025

InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue

...

498

15 Oct 2025

Think Smart, Not Hard: Difficulty Adaptive Reasoning for Large Audio Language Models

394

26 Sep 2025

Benchmarking Gaslighting Attacks Against Speech Large Language Models

107

24 Sep 2025

SAM: A Mamba-2 State-Space Audio-Language Model

209

19 Sep 2025

SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding

...

291

18 Sep 2025

Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models

138

17 Sep 2025

AudioCodecBench: A Comprehensive Benchmark for Audio Codec Evaluation

294

02 Sep 2025

$IS${}^3$ : Generic Impulsive--Stationary Sound Separation in Acoustic Scenes using Deep Filtering$

{}^3

: Generic Impulsive--Stationary Sound Separation in Acoustic Scenes using Deep Filtering

Clémentine Berger

Paraskevas Stamatiadis

Roland Badeau

S. Essid

189

01 Sep 2025

CodecBench: A Comprehensive Benchmark for Acoustic and Semantic Evaluation

166

28 Aug 2025

OSUM-EChat: Enhancing End-to-End Empathetic Spoken Chatbot via Understanding-Driven Spoken Dialogue

...

252

13 Aug 2025

A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understanding

311

07 Aug 2025

MiDashengLM: Efficient Audio Understanding with General Audio Captions

527

06 Aug 2025

NVSpeech: An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizations

271

06 Aug 2025

From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-modal Understanding in Multimodal LLMs

309

03 Aug 2025

Step-Audio 2 Technical Report

...

357

22 Jul 2025

MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation

...

269

31 May 2025

StressTest: Can YOUR Speech LM Handle the Stress?

Iddo Yosha

Gallil Maimon

Yossi Adi

290

28 May 2025

From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data

Chun-Yi Kuan

Hung-yi Lee

AuLLM

367

26 May 2025

Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities

Yu-Gang Jiang

270

23 May 2025

X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance

489

22 May 2025

Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples

Chun-Yi Kuan

Hung-yi Lee

389

20 May 2025

Kimi-Audio Technical Report

...

565

161

25 Apr 2025

Transformation of audio embeddings into interpretable, concept-based representations

Alice Zhang

Edison Thomaz

Lie Lu

296

18 Apr 2025

voc2vec: A Foundation Model for Non-Verbal VocalizationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

Alkis Koudounas

Moreno La Quatra

Marco Sabato Siniscalchi

Elena Baralis

302

22 Feb 2025

Soundwave: Less is More for Speech-Text Alignment in LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

322

18 Feb 2025

OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia

...

405

23 Jan 2025

Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio ReasoningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Chun-Yi Kuan

Hung-yi Lee

AuLLM LRM

485

03 Jan 2025

Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled AudioIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

276

23 Dec 2024

MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models

Kun Wang

Xuming Hu

363

07 Oct 2024

OmniBench: Towards The Future of Universal Omni-Language Models

...

767

23 Sep 2024

DeFT-Mamba: Universal Multichannel Sound Separation and Polyphonic Audio ClassificationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Dongheon Lee

Jung-Woo Choi

Mamba

229

19 Sep 2024

D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial AttackIEEE International Joint Conference on Neural Network (IJCNN), 2024

375

11 Sep 2024

Qwen2-Audio Technical Report

Yunfei Chu

Jin Xu

...

Chang Zhou

Jingren Zhou

AuLLM VLM

450

498

15 Jul 2024

Domain Adaptation for Contrastive Audio-Language Models

Soham Deshmukh

Rita Singh

Bhiksha Raj

VLM

267

14 Feb 2024

Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline

253

26 Jan 2024

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Yunfei Chu

Jin Xu

Xiaohuan Zhou

Qian Yang

Shiliang Zhang

Zhijie Yan

Chang Zhou

Jingren Zhou

AuLLM

448

700

14 Nov 2023

Haha-Pod: An Attempt for Laughter-based Non-Verbal Speaker VerificationAutomatic Speech Recognition & Understanding (ASRU), 2023

348

25 Sep 2023

Natural Language Supervision for General-Purpose Audio RepresentationsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

301

114

11 Sep 2023

Listen, Think, and UnderstandInternational Conference on Learning Representations (ICLR), 2023

831

241

18 May 2023

Active Learning of Non-semantic Speech Tasks with Pretrained ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

358

31 Oct 2022

On Out-of-Distribution Detection for Audio with Deep Nearest NeighborsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Zaharah Bukhsh

Aaqib Saeed

OODD

162

27 Oct 2022

Distilled Non-Semantic Speech Embeddings with Binary Neural Networks for Low-Resource DevicesPattern Recognition Letters (PRL), 2022

Harlin Lee

Aaqib Saeed

326

12 Jul 2022

Exploring Automatic Diagnosis of COVID-19 from Crowdsourced Respiratory Sound DataKnowledge Discovery and Data Mining (KDD), 2020

Apinan Hasthanasombat

620

434

10 Jun 2020