ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1709.05522
  4. Cited By
AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech
  Recognition Baseline

AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline

16 September 2017
Hui Bu
Jiayu Du
Xingyu Na
Bengu Wu
Hao Zheng
    CVBM
ArXiv (abs)PDFHTML

Papers citing "AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline"

50 / 451 papers shown
DnR-nonverbal: Cinematic Audio Source Separation Dataset Containing Non-Verbal Sounds
DnR-nonverbal: Cinematic Audio Source Separation Dataset Containing Non-Verbal Sounds
Takuya Hasumi
Yusuke Fujita
221
0
0
03 Jun 2025
XMAD-Bench: Cross-Domain Multilingual Audio Deepfake Benchmark
XMAD-Bench: Cross-Domain Multilingual Audio Deepfake Benchmark
Ioan-Paul Ciobanu
Andrei Iulian Hiji
Nicolae-Cătălin Ristea
Paul Irofti
Cristian Rusu
Radu Tudor Ionescu
168
0
0
31 May 2025
ARECHO: Autoregressive Evaluation via Chain-Based Hypothesis Optimization for Speech Multi-Metric Estimation
ARECHO: Autoregressive Evaluation via Chain-Based Hypothesis Optimization for Speech Multi-Metric Estimation
Jiatong Shi
Yifan Cheng
Bo-Hao Su
Hye-jin Shim
Jinchuan Tian
Samuele Cornell
Yiwen Zhao
Siddhant Arora
Shinji Watanabe
243
0
0
30 May 2025
ZIPA: A family of efficient models for multilingual phone recognition
ZIPA: A family of efficient models for multilingual phone recognitionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jian Zhu
Farhan Samir
Eleanor Chodroff
David R. Mortensen
209
4
0
29 May 2025
Neuromorphic Sequential Arena: A Benchmark for Neuromorphic Temporal Processing
Neuromorphic Sequential Arena: A Benchmark for Neuromorphic Temporal ProcessingInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Xinyi Chen
Chenxiang Ma
Yujie Wu
Kay Chen Tan
Jibin Wu
AI4TS
162
3
0
28 May 2025
Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis
Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis
Tianyi Xu
Hongjie Chen
Wang Qing
Lv Hang
Jian Kang
Li Jie
Zhennan Lin
Yongxiang Li
Xie Lei
262
4
0
27 May 2025
Mel-McNet: A Mel-Scale Framework for Online Multichannel Speech Enhancement
Mel-McNet: A Mel-Scale Framework for Online Multichannel Speech Enhancement
Yujie Yang
Bing Yang
Xiaofei Li
207
0
0
26 May 2025
ModRWKV: Transformer Multimodality in Linear Time
ModRWKV: Transformer Multimodality in Linear Time
Jiale Kang
Ziyin Yue
Qingyu Yin
Jiang Rui
W. Li
Zening Lu
Zhouran Ji
OffRL
234
0
0
20 May 2025
Cross-modal Knowledge Transfer Learning as Graph Matching Based on Optimal Transport for ASR
Cross-modal Knowledge Transfer Learning as Graph Matching Based on Optimal Transport for ASR
Xugang Lu
Peng Shen
Yu Tsao
Hisashi Kawai
OT
317
0
0
19 May 2025
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Zuwei Long
Chunjiang Ge
Chaoyou Fu
Heting Gao
Lijiang Li
...
Jinlong Peng
Haoyu Cao
Ke Li
Rongrong Ji
Xing Sun
247
17
0
06 May 2025
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
351
3
0
01 May 2025
BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech Recognition
BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech RecognitionComputer Speech and Language (CSL), 2025
Paige Tuttosi
Mantaj Dhillon
Luna Sang
Shane Eastwood
Poorvi Bhatia
Quang Minh Dinh
Avni Kapoor
Yewon Jin
Angelica Lim
332
3
0
30 Apr 2025
Kimi-Audio Technical Report
Kimi-Audio Technical Report
KimiTeam
Ding Ding
Zeqian Ju
Yichong Leng
Shixuan Liu
...
Zhiyong Yang
Aoxiong Yin
Ruibin Yuan
Yanzhe Zhang
Zaida Zhou
AuLLMVLM
429
124
0
25 Apr 2025
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Xingguang Ji
Jiakang Wang
Hongzhi Zhang
Jingyuan Zhang
Haonan Zhou
Chenxi Sun
Wenshu Fan
Qi Wang
Fuzheng Zhang
MLLMVLM
302
1
0
10 Apr 2025
M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper
M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing WhisperIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Jiaming Zhou
Songtao Zhao
Jiabei He
Hui Wang
Wenjia Zeng
Yong Chen
Haoqin Sun
Aobo Kong
Yong Qin
477
5
0
13 Mar 2025
Linguistic Knowledge Transfer Learning for Speech Enhancement
Kuo-Hsuan Hung
Xugang Lu
Szu-Wei Fu
Huan-Hsin Tseng
Hsin-Yi Lin
Chii-Wann Lin
Yu Tsao
VLM
303
1
0
10 Mar 2025
Rethinking Few-Shot Medical Image Segmentation by SAM2: A Training-Free Framework with Augmentative Prompting and Dynamic Matching
Haiyue Zu
Jun Ge
Heting Xiao
Jile Xie
Zhangzhe Zhou
...
Jiayi Ni
Junjie Niu
Linlin Zhang
Li Ni
Huilin Yang
MedImVLM
219
3
0
05 Mar 2025
CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASRIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Nian Shao
Rui Zhou
Pengyu Wang
Xian Li
Ying Fang
Yujie Yang
Xiaofei Li
434
5
0
27 Feb 2025
Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
Che Liu
Yingji Zhang
D. Zhang
Weijie Zhang
Chenggong Gong
...
Junwei Liao
Haipang Wu
Ji Liu
André Freitas
Qifan Wang
AuLLM
597
7
0
26 Feb 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLMVLM
609
12
0
26 Feb 2025
Audio-FLAN: A Preliminary Release
Audio-FLAN: A Preliminary Release
Liumeng Xue
Ziya Zhou
J. Pan
Zhiyu Li
Shuai Fan
...
Haohe Liu
Emmanouil Benetos
Ge Zhang
Wenhan Luo
Wei Xue
MLLMAuLLMCLIPVLM
285
2
0
23 Feb 2025
CR-CTC: Consistency regularization on CTC for improved speech recognition
CR-CTC: Consistency regularization on CTC for improved speech recognitionInternational Conference on Learning Representations (ICLR), 2024
Zengwei Yao
Wei Kang
Xiaoyu Yang
Fangjun Kuang
Liyong Guo
Han Zhu
Zengrui Jin
Zhaoqing Li
Long Lin
Daniel Povey
381
13
0
17 Feb 2025
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Wei Deng
Siyi Zhou
Jingchen Shu
Jinchao Wang
Lu Wang
VLM
298
25
0
08 Feb 2025
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
Kai-Tuo Xu
Feng-Long Xie
Xu Tang
Feng-Long Xie
411
41
0
24 Jan 2025
OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia
OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia
Xuelong Geng
Kun Wei
Qijie Shao
Shuiyun Liu
Zhennan Lin
...
Yuhang Dai
Xinfa Zhu
Yue Li
Li Zhang
Lei Xie
329
21
0
23 Jan 2025
BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR
BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Guodong Ma
Wenxuan Wang
Lifeng Zhou
Yuting Yang
Yuke Li
Binbin Du
MoE
284
4
0
22 Jan 2025
Adaptive Data Augmentation with NaturalSpeech3 for Far-field Speaker Verification
Adaptive Data Augmentation with NaturalSpeech3 for Far-field Speaker Verification
Li Zhang
Jiyao Liu
Lei Xie
271
0
0
15 Jan 2025
FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion
FreeSVC: Towards Zero-shot Multilingual Singing Voice ConversionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Alef Iury Ferreira
L. Gris
Augusto Seben da Rosa
F. S. Oliveira
Edresson Casanova
R. T. Sousa
Arnaldo Cândido Júnior
A. S. Soares
A. R. G. Filho
168
3
0
09 Jan 2025
Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition
Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Wei Zhang
Tian-Hao Zhang
Chao Luo
Hui Zhou
Chao Yang
Xinyuan Qian
Xu-cheng Yin
127
0
0
08 Jan 2025
Transliterated Zero-Shot Domain Adaptation for Automatic Speech
  Recognition
Transliterated Zero-Shot Domain Adaptation for Automatic Speech Recognition
Han Zhu
Gaofeng Cheng
Qingwei Zhao
Pengyuan Zhang
VLM
318
0
0
15 Dec 2024
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for
  Long-term Streaming Video and Audio Interactions
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Pan Zhang
Xiaoyi Dong
Yuhang Cao
Yuhang Zang
Rui Qian
...
Xinsong Zhang
Kai Chen
Yu Qiao
Dahua Lin
Jiaqi Wang
KELM
371
34
0
12 Dec 2024
ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error
  Correction
ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction
Victor Junqiu Wei
Weicheng Wang
Chen Zhang
Wailing Ng
Lu Wang
203
8
0
04 Dec 2024
FERERO: A Flexible Framework for Preference-Guided Multi-Objective
  Learning
FERERO: A Flexible Framework for Preference-Guided Multi-Objective LearningNeural Information Processing Systems (NeurIPS), 2024
Lisha Chen
A. F. M. Saif
Yanning Shen
Tianyi Chen
267
4
0
02 Dec 2024
Complexity boosted adaptive training for better low resource ASR
  performance
Complexity boosted adaptive training for better low resource ASR performance
Hongxuan Lu
Shenjian Wang
Biao Li
280
0
0
01 Dec 2024
Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep
  Language Posterior Injection
Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior InjectionSpoken Language Technology Workshop (SLT), 2024
Tzu-Ting Yang
Hsin-Wei Wang
Yi-Cheng Wang
Berlin Chen
338
0
0
26 Nov 2024
Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for
  Speech Recognition
Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech RecognitionSpoken Language Technology Workshop (SLT), 2024
Yoshiki Masuyama
Koichi Miyazaki
Masato Murata
Mamba
264
6
0
11 Nov 2024
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model
  with Frozen LLM
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Xiong Wang
Yangze Li
Chaoyou Fu
Chunjiang Ge
Lei Xie
Ke Li
Xing Sun
Long Ma
AuLLMMLLM
439
103
0
01 Nov 2024
Do Discrete Self-Supervised Representations of Speech Capture Tone
  Distinctions?
Do Discrete Self-Supervised Representations of Speech Capture Tone Distinctions?
Opeyemi Osakuade
Simon King
226
2
0
25 Oct 2024
Optimizing Neural Speech Codec for Low-Bitrate Compression via
  Multi-Scale Encoding
Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding
Peiji Yang
Fengping Wang
Yicheng Zhong
Huawei Wei
Zhisheng Wang
193
1
0
21 Oct 2024
The First VoicePrivacy Attacker Challenge Evaluation Plan
The First VoicePrivacy Attacker Challenge Evaluation PlanIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
N. Tomashenko
Xiaoxiao Miao
Emmanuel Vincent
Junichi Yamagishi
442
9
0
09 Oct 2024
A Simple yet Effective Training-free Prompt-free Approach to Chinese
  Spelling Correction Based on Large Language Models
A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Houquan Zhou
Zhenghua Li
Bo Zhang
Chen Li
Shaopeng Lai
Ji Zhang
Fei Huang
Hao Fei
LRM
297
6
0
05 Oct 2024
Automated Tone Transcription and Clustering with Tone2Vec
Automated Tone Transcription and Clustering with Tone2VecConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yi Yang
Yiming Wang
ZhiQiang Tang
Jiahong Yuan
82
1
0
03 Oct 2024
Mamba for Streaming ASR Combined with Unimodal Aggregation
Mamba for Streaming ASR Combined with Unimodal AggregationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Ying Fang
Xiaofei Li
Mamba
220
9
0
30 Sep 2024
HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Bingshen Mu
Kun Wei
Qijie Shao
Yong Xu
Lei Xie
MoE
496
12
0
30 Sep 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid EmotionsComputer Vision and Pattern Recognition (CVPR), 2024
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
AuLLMMLLMVLM
437
44
0
26 Sep 2024
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in
  Any-to-One Voice Conversion
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice ConversionConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Giuseppe Ruggiero
Matteo Testa
Jurgen Van de Walle
Luigi Di Caro
168
1
0
25 Sep 2024
Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character
  Pre-training in LLMs
Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMsInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2024
Yang Yuhang
Peng Yizhou
Eng Siong Chng
Xionghu Zhong
AuLLMAI4CE
192
2
0
24 Sep 2024
Boosting Code-Switching ASR with Mixture of Experts Enhanced
  Speech-Conditioned LLM
Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLMIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Fengrun Zhang
Wang Geng
Hukai Huang
Cheng Yi
He Qu
He Qu
AuLLMMoE
193
9
0
24 Sep 2024
OmniBench: Towards The Future of Universal Omni-Language Models
OmniBench: Towards The Future of Universal Omni-Language Models
Y. Li
Ge Zhang
Yinghao Ma
Ruibin Yuan
Kang Zhu
...
Zhaoxiang Zhang
Zachary Liu
Emmanouil Benetos
Wenhao Huang
Chenghua Lin
LRM
611
51
0
23 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
A Comprehensive Survey with Critical Analysis for Deepfake Speech DetectionComputer Science Review (CSR), 2024
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
522
14
0
23 Sep 2024
Previous
12345...8910
Next
Page 2 of 10