AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline

16 September 2017

Hui Bu

Papers citing "AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline"

50 / 451 papers shown

DnR-nonverbal: Cinematic Audio Source Separation Dataset Containing Non-Verbal Sounds

Takuya Hasumi

Yusuke Fujita

221

03 Jun 2025

XMAD-Bench: Cross-Domain Multilingual Audio Deepfake Benchmark

Ioan-Paul Ciobanu

Andrei Iulian Hiji

Nicolae-Cătălin Ristea

Paul Irofti

Cristian Rusu

Radu Tudor Ionescu

168

31 May 2025

ARECHO: Autoregressive Evaluation via Chain-Based Hypothesis Optimization for Speech Multi-Metric Estimation

243

30 May 2025

ZIPA: A family of efficient models for multilingual phone recognitionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

209

29 May 2025

Neuromorphic Sequential Arena: A Benchmark for Neuromorphic Temporal ProcessingInternational Joint Conference on Artificial Intelligence (IJCAI), 2025

162

28 May 2025

Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis

262

27 May 2025

Mel-McNet: A Mel-Scale Framework for Online Multichannel Speech Enhancement

Yujie Yang

Bing Yang

Xiaofei Li

207

26 May 2025

ModRWKV: Transformer Multimodality in Linear Time

234

20 May 2025

Cross-modal Knowledge Transfer Learning as Graph Matching Based on Optimal Transport for ASR

317

19 May 2025

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

...

247

06 May 2025

Voice Cloning: Comprehensive Survey

Hussam Azzuni

Abdulmotaleb El Saddik

VLM

351

01 May 2025

BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech RecognitionComputer Speech and Language (CSL), 2025

332

30 Apr 2025

Kimi-Audio Technical Report

...

429

124

25 Apr 2025

Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models

302

10 Apr 2025

M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing WhisperIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

477

13 Mar 2025

Linguistic Knowledge Transfer Learning for Speech Enhancement

303

10 Mar 2025

Rethinking Few-Shot Medical Image Segmentation by SAM2: A Training-Free Framework with Augmentative Prompting and Dynamic Matching

...

219

05 Mar 2025

CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASRIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025

434

27 Feb 2025

Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision

...

597

26 Feb 2025

M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance

...

609

26 Feb 2025

Audio-FLAN: A Preliminary Release

...

285

23 Feb 2025

CR-CTC: Consistency regularization on CTC for improved speech recognitionInternational Conference on Learning Representations (ICLR), 2024

381

17 Feb 2025

IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

298

08 Feb 2025

FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration

411

24 Jan 2025

OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia

...

329

23 Jan 2025

BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

284

22 Jan 2025

Adaptive Data Augmentation with NaturalSpeech3 for Far-field Speaker Verification

Li Zhang

Jiyao Liu

Lei Xie

271

15 Jan 2025

FreeSVC: Towards Zero-shot Multilingual Singing Voice ConversionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

Alef Iury Ferreira

L. Gris

Augusto Seben da Rosa

F. S. Oliveira

Edresson Casanova

R. T. Sousa

Arnaldo Cândido Júnior

A. S. Soares

A. R. G. Filho

168

09 Jan 2025

Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

127

08 Jan 2025

Transliterated Zero-Shot Domain Adaptation for Automatic Speech Recognition

318

15 Dec 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

...

371

12 Dec 2024

ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction

203

04 Dec 2024

FERERO: A Flexible Framework for Preference-Guided Multi-Objective LearningNeural Information Processing Systems (NeurIPS), 2024

267

02 Dec 2024

Complexity boosted adaptive training for better low resource ASR performance

Hongxuan Lu

Shenjian Wang

Biao Li

280

01 Dec 2024

Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior InjectionSpoken Language Technology Workshop (SLT), 2024

338

26 Nov 2024

Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech RecognitionSpoken Language Technology Workshop (SLT), 2024

264

11 Nov 2024

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Chaoyou Fu

Ke Li

Long Ma

439

103

01 Nov 2024

Do Discrete Self-Supervised Representations of Speech Capture Tone Distinctions?

Opeyemi Osakuade

Simon King

226

25 Oct 2024

Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding

193

21 Oct 2024

The First VoicePrivacy Attacker Challenge Evaluation PlanIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

442

09 Oct 2024

A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Houquan Zhou

Zhenghua Li

Bo Zhang

Chen Li

Shaopeng Lai

Ji Zhang

Fei Huang

Hao Fei

LRM

297

05 Oct 2024

Automated Tone Transcription and Clustering with Tone2VecConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

03 Oct 2024

Mamba for Streaming ASR Combined with Unimodal AggregationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Ying Fang

Xiaofei Li

Mamba

220

30 Sep 2024

HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Bingshen Mu

Kun Wei

Qijie Shao

Yong Xu

Lei Xie

MoE

496

30 Sep 2024

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid EmotionsComputer Vision and Pattern Recognition (CVPR), 2024

Kai Chen

Zhili Liu

...

Jun Yao

437

26 Sep 2024

Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice ConversionConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

168

25 Sep 2024

Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMsInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2024

192

24 Sep 2024

Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLMIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Fengrun Zhang

Wang Geng

193

24 Sep 2024

OmniBench: Towards The Future of Universal Omni-Language Models

...

611

23 Sep 2024

A Comprehensive Survey with Critical Analysis for Deepfake Speech DetectionComputer Science Review (CSR), 2024

Lam Pham

Phat Lam

Dat Tran

Hieu Tang

Tin Nguyen

Alexander Schindler

Canh Vu

Alexander Polonsky

Canh Vu

522

23 Sep 2024