ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.03433
  4. Cited By
Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition
v1v2 (latest)

Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
6 May 2022
Yuan Gong
Jingbo Yu
James R. Glass
ArXiv (abs)PDFHTMLGithub (163★)

Papers citing "Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition"

47 / 47 papers shown
MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation
MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation
Youxin Pang
Jiajun Liu
L. Tan
Yong Zhang
Feng Gao
Xiang Deng
Zhuoliang Kang
Xiaoming Wei
Y. Liu
VGen
210
1
0
02 Dec 2025
DHAuDS: A Dynamic and Heterogeneous Audio Benchmark for Test-Time Adaptation
DHAuDS: A Dynamic and Heterogeneous Audio Benchmark for Test-Time Adaptation
Weichuang Shao
I. Liao
Tomas Henrique Bode Maul
T. Chandesa
TTA
274
0
0
23 Nov 2025
LongCat-Flash-Omni Technical Report
LongCat-Flash-Omni Technical Report
M-A-P Team
Bairui Wang
Bayan
Bin Xiao
Bo Zhang
...
Xin Pan
Xin Chen
Xiusong Sun
Xu Xiang
X. Xing
MLLMVLM
665
17
0
31 Oct 2025
AMAuT: A Flexible and Efficient Multiview Audio Transformer Framework Trained from Scratch
AMAuT: A Flexible and Efficient Multiview Audio Transformer Framework Trained from Scratch
Weichuang Shao
I. Liao
Tomas Henrique Bode Maul
T. Chandesa
175
1
0
22 Oct 2025
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Wenwen Tong
Hewei Guo
Dongchuan Ran
Jiangnan Chen
Jiefan Lu
...
Dinghao Zhou
Guiping Zhong
Ken Zheng
Shiyin Kang
Lewei Lu
MLLMAuLLMVGenVLM
498
12
0
15 Oct 2025
Think Smart, Not Hard: Difficulty Adaptive Reasoning for Large Audio Language Models
Think Smart, Not Hard: Difficulty Adaptive Reasoning for Large Audio Language Models
Zhichao Sheng
Shilin Zhou
Chen Gong
Zhenghua Li
AuLLMLRM
394
0
0
26 Sep 2025
Benchmarking Gaslighting Attacks Against Speech Large Language Models
Benchmarking Gaslighting Attacks Against Speech Large Language Models
Jinyang Wu
Bin Zhu
Xiandong Zou
Qiquan Zhang
Xu Fang
Pan Zhou
AAML
107
1
0
24 Sep 2025
SAM: A Mamba-2 State-Space Audio-Language Model
SAM: A Mamba-2 State-Space Audio-Language Model
Taehan Lee
Jaehan Jung
Hyukjun Lee
MambaAuLLM
209
0
0
19 Sep 2025
SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding
SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding
Bingsong Bai
Qihang Lu
Wenbing Yang
Zihan Sun
Yueran Hou
...
Songbai Pu
Ruibo Fu
Y. Gao
Ya Li
Jun Gao
291
2
0
18 Sep 2025
Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models
Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models
Ilyass Moummad
Kawtar Zaher
Lukas Rauch
Alexis Joly
138
1
0
17 Sep 2025
AudioCodecBench: A Comprehensive Benchmark for Audio Codec Evaluation
AudioCodecBench: A Comprehensive Benchmark for Audio Codec Evaluation
Lu Wang
Hao Chen
Siyu Wu
Zhiyue Wu
Hao Zhou
C. Zhang
Ting Wang
Haodi Zhang
AuLLM
294
4
0
02 Sep 2025
IS${}^3$ : Generic Impulsive--Stationary Sound Separation in Acoustic Scenes using Deep Filtering
IS3{}^33 : Generic Impulsive--Stationary Sound Separation in Acoustic Scenes using Deep Filtering
Clémentine Berger
Paraskevas Stamatiadis
Roland Badeau
S. Essid
189
0
0
01 Sep 2025
CodecBench: A Comprehensive Benchmark for Acoustic and Semantic Evaluation
CodecBench: A Comprehensive Benchmark for Acoustic and Semantic Evaluation
Ruifan Deng
Yitian Gong
Qinghui Gao
Luozhijie Jin
Qinyuan Cheng
Zhaoye Fei
Shimin Li
Xipeng Qiu
AuLLM
166
2
0
28 Aug 2025
OSUM-EChat: Enhancing End-to-End Empathetic Spoken Chatbot via Understanding-Driven Spoken Dialogue
OSUM-EChat: Enhancing End-to-End Empathetic Spoken Chatbot via Understanding-Driven Spoken Dialogue
Xuelong Geng
Qijie Shao
Hongfei Xue
Shuiyuan Wang
Hanke Xie
...
Longhao Li
Yuhang Dai
Dehui Gao
Dake Guo
Lei Xie
AuLLM
252
15
0
13 Aug 2025
A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understanding
A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understanding
Runchuan Ye
Yixuan Zhou
Renjie Yu
Zijian Lin
Kehan Li
Xiang Li
Xin Liu
Guoyang Zeng
Z. Wu
SLR
311
8
0
07 Aug 2025
MiDashengLM: Efficient Audio Understanding with General Audio Captions
MiDashengLM: Efficient Audio Understanding with General Audio Captions
Heinrich Dinkel
Gang Li
Jizhong Liu
Jian Luan
Yadong Niu
Xingwei Sun
Tianzi Wang
Qiyang Xiao
Junbo Zhang
Jiahao Zhou
AuLLMVLM
527
26
0
06 Aug 2025
NVSpeech: An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizations
NVSpeech: An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizations
Huan Liao
Qinke Ni
Yuancheng Wang
Yiheng Lu
Haoyue Zhan
Pengyuan Xie
Qiang Zhang
Zhizheng Wu
271
13
0
06 Aug 2025
From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-modal Understanding in Multimodal LLMs
From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-modal Understanding in Multimodal LLMs
Yuhang Jia
Xu Zhang
Yong Qin
Yang Chen
Shiwan Zhao
VLM
309
0
0
03 Aug 2025
Step-Audio 2 Technical Report
Step-Audio 2 Technical Report
Boyong Wu
Chao Yan
Chen Hu
Cheng Yi
Chengli Feng
...
Yuanwei Lu
Yuchu Luo
Yuhe Yin
Yumeng Zhan
Y. Zhang
AuLLM
357
0
0
22 Jul 2025
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation
Yakun Song
Jiawei Chen
Xiaobin Zhuang
Chenpeng Du
Ziyang Ma
...
Dongya Jia
Zhuo Chen
Yuping Wang
Yuping Wang
Xie Chen
269
5
0
31 May 2025
StressTest: Can YOUR Speech LM Handle the Stress?
StressTest: Can YOUR Speech LM Handle the Stress?
Iddo Yosha
Gallil Maimon
Yossi Adi
290
5
0
28 May 2025
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data
Chun-Yi Kuan
Hung-yi Lee
AuLLM
367
2
0
26 May 2025
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
Ziwei Zhou
Rui Wang
Zuxuan Wu
Yu-Gang Jiang
AuLLMVGen
270
44
0
23 May 2025
X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance
X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance
Junbo Zhang
Heinrich Dinkel
Yadong Niu
Chenyu Liu
Si Cheng
Anbei Zhao
Jian Luan
489
8
0
22 May 2025
Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples
Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples
Chun-Yi Kuan
Hung-yi Lee
389
7
0
20 May 2025
Kimi-Audio Technical Report
Kimi-Audio Technical Report
KimiTeam
Ding Ding
Zeqian Ju
Yichong Leng
Shixuan Liu
...
Zhiyong Yang
Aoxiong Yin
Ruibin Yuan
Yanzhe Zhang
Zaida Zhou
AuLLMVLM
565
161
0
25 Apr 2025
Transformation of audio embeddings into interpretable, concept-based representations
Transformation of audio embeddings into interpretable, concept-based representations
Alice Zhang
Edison Thomaz
Lie Lu
296
0
0
18 Apr 2025
voc2vec: A Foundation Model for Non-Verbal Vocalization
voc2vec: A Foundation Model for Non-Verbal VocalizationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Alkis Koudounas
Moreno La Quatra
Marco Sabato Siniscalchi
Elena Baralis
302
14
0
22 Feb 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
Soundwave: Less is More for Speech-Text Alignment in LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yunke Zhang
Zhiheng Liu
Fan Bu
Ruiyu Zhang
Benyou Wang
Haoyang Li
AuLLMSyDaVLM
322
10
0
18 Feb 2025
OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia
OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia
Xuelong Geng
Kun Wei
Qijie Shao
Shuiyun Liu
Zhennan Lin
...
Yuhang Dai
Xinfa Zhu
Yue Li
Li Zhang
Lei Xie
405
25
0
23 Jan 2025
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio ReasoningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Chun-Yi Kuan
Hung-yi Lee
AuLLMLRM
485
25
0
03 Jan 2025
Multiple Consistency-guided Test-Time Adaptation for Contrastive
  Audio-Language Models with Unlabeled Audio
Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled AudioIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Gongyu Chen
Haomin Zhang
Chaofan Ding
Zihao Chen
Xinhan Di
276
1
0
23 Dec 2024
MINER: Mining the Underlying Pattern of Modality-Specific Neurons in
  Multimodal Large Language Models
MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models
Kaichen Huang
Jiahao Huo
Yibo Yan
Kun Wang
Yutao Yue
Xuming Hu
363
4
0
07 Oct 2024
OmniBench: Towards The Future of Universal Omni-Language Models
OmniBench: Towards The Future of Universal Omni-Language Models
Y. Li
Ge Zhang
Yinghao Ma
Ruibin Yuan
Kang Zhu
...
Zhaoxiang Zhang
Zachary Liu
Emmanouil Benetos
Wenhao Huang
Chenghua Lin
LRM
767
66
0
23 Sep 2024
DeFT-Mamba: Universal Multichannel Sound Separation and Polyphonic Audio
  Classification
DeFT-Mamba: Universal Multichannel Sound Separation and Polyphonic Audio ClassificationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Dongheon Lee
Jung-Woo Choi
Mamba
229
13
0
19 Sep 2024
D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under
  Transferable Imperceptible Adversarial Attack
D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial AttackIEEE International Joint Conference on Neural Network (IJCNN), 2024
Hong-Hanh Nguyen-Le
Van-Tuan Tran
Dinh-Thuc Nguyen
Nhien-An Le-Khac
AAML
375
2
0
11 Sep 2024
Qwen2-Audio Technical Report
Qwen2-Audio Technical Report
Yunfei Chu
Jin Xu
Qian Yang
Haojie Wei
Xipin Wei
...
Yuanjun Lv
Jinzheng He
Junyang Lin
Chang Zhou
Jingren Zhou
AuLLMVLM
450
498
0
15 Jul 2024
Domain Adaptation for Contrastive Audio-Language Models
Domain Adaptation for Contrastive Audio-Language Models
Soham Deshmukh
Rita Singh
Bhiksha Raj
VLM
267
13
0
14 Feb 2024
Toward Practical Automatic Speech Recognition and Post-Processing: a
  Call for Explainable Error Benchmark Guideline
Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline
Seonmin Koo
Chanjun Park
Jinsung Kim
Jaehyung Seo
Sugyeong Eo
Hyeonseok Moon
Heu-Jeoung Lim
253
4
0
26 Jan 2024
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
448
700
0
14 Nov 2023
Haha-Pod: An Attempt for Laughter-based Non-Verbal Speaker Verification
Haha-Pod: An Attempt for Laughter-based Non-Verbal Speaker VerificationAutomatic Speech Recognition & Understanding (ASRU), 2023
Yuke Lin
Xiaoyi Qin
Ning Jiang
Guoqing Zhao
Ming Li
348
3
0
25 Sep 2023
Natural Language Supervision for General-Purpose Audio Representations
Natural Language Supervision for General-Purpose Audio RepresentationsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Benjamin Elizalde
Soham Deshmukh
Huaming Wang
AuLLMAI4TS
301
114
0
11 Sep 2023
Listen, Think, and Understand
Listen, Think, and UnderstandInternational Conference on Learning Representations (ICLR), 2023
Yuan Gong
Hongyin Luo
Alexander H. Liu
Leonid Karlinsky
James R. Glass
ELMMLLMLRM
831
241
0
18 May 2023
Active Learning of Non-semantic Speech Tasks with Pretrained Models
Active Learning of Non-semantic Speech Tasks with Pretrained ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Harlin Lee
Aaqib Saeed
Andrea L. Bertozzi
VLM
358
3
0
31 Oct 2022
On Out-of-Distribution Detection for Audio with Deep Nearest Neighbors
On Out-of-Distribution Detection for Audio with Deep Nearest NeighborsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Zaharah Bukhsh
Aaqib Saeed
OODD
162
16
0
27 Oct 2022
Distilled Non-Semantic Speech Embeddings with Binary Neural Networks for
  Low-Resource Devices
Distilled Non-Semantic Speech Embeddings with Binary Neural Networks for Low-Resource DevicesPattern Recognition Letters (PRL), 2022
Harlin Lee
Aaqib Saeed
326
3
0
12 Jul 2022
Exploring Automatic Diagnosis of COVID-19 from Crowdsourced Respiratory
  Sound Data
Exploring Automatic Diagnosis of COVID-19 from Crowdsourced Respiratory Sound DataKnowledge Discovery and Data Mining (KDD), 2020
Chloë Brown
Jagmohan Chauhan
Andreas Grammenos
Jing Han
Apinan Hasthanasombat
Dimitris Spathis
Tong Xia
Pietro Cicuta
Cecilia Mascolo
620
434
0
10 Jun 2020
1
Page 1 of 1