ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1709.05522
  4. Cited By
AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech
  Recognition Baseline

AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline

16 September 2017
Hui Bu
Jiayu Du
Xingyu Na
Bengu Wu
Hao Zheng
    CVBM
ArXiv (abs)PDFHTML

Papers citing "AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline"

50 / 451 papers shown
HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot
  Text-to-Speech with Model and Data Scaling
HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling
Chunhui Wang
Chang Zeng
Bowen Zhang
Ziyang Ma
Yefan Zhu
Zifeng Cai
Jian Zhao
Zhonglin Jiang
Yong Chen
SyDa
130
8
0
09 Mar 2024
An Effective Mixture-Of-Experts Approach For Code-Switching Speech
  Recognition Leveraging Encoder Disentanglement
An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement
Tzu-Ting Yang
Hsin-Wei Wang
Yi-Cheng Wang
Chi-Han Lin
Berlin Chen
217
11
0
27 Feb 2024
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech
  Recognition, Translation, and Language Identification
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Yifan Peng
Yui Sudo
Muhammad Shakeel
Shinji Watanabe
VLM
339
35
0
20 Feb 2024
Speech Translation with Speech Foundation Models and Large Language
  Models: What is There and What is Missing?
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
466
26
0
19 Feb 2024
UniEnc-CASSNAT: An Encoder-only Non-autoregressive ASR for Speech SSL
  Models
UniEnc-CASSNAT: An Encoder-only Non-autoregressive ASR for Speech SSL Models
Ruchao Fan
Natarajan Balaji Shankar
Abeer Alwan
244
2
0
14 Feb 2024
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on
  E-Branchformer
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Yifan Peng
Jinchuan Tian
William Chen
Siddhant Arora
Brian Yan
...
Kwanghee Choi
Jiatong Shi
Xuankai Chang
Jee-weon Jung
Shinji Watanabe
VLMOSLM
304
88
0
30 Jan 2024
Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording
  Privilege
Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording PrivilegeIEEE Transactions on Dependable and Secure Computing (IEEE TDSC), 2024
Peng Huang
Yao Wei
Jun Zhou
Zhongjie Ba
Liwang Lu
Feng Lin
Yang Wang
Kui Ren
182
1
0
28 Jan 2024
Toward Practical Automatic Speech Recognition and Post-Processing: a
  Call for Explainable Error Benchmark Guideline
Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline
Seonmin Koo
Chanjun Park
Jinsung Kim
Jaehyung Seo
Sugyeong Eo
Hyeonseok Moon
Heu-Jeoung Lim
179
4
0
26 Jan 2024
Using Large Language Model for End-to-End Chinese ASR and NER
Using Large Language Model for End-to-End Chinese ASR and NERInterspeech (Interspeech), 2024
Yuang Li
Jiawei Yu
Min Zhang
Mengxin Ren
Yanqing Zhao
Xiaofeng Zhao
Miaomiao Ma
Yan Yu
Hao Yang
315
13
0
21 Jan 2024
UCorrect: An Unsupervised Framework for Automatic Speech Recognition
  Error Correction
UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error CorrectionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jiaxin Guo
Minghan Wang
Xiaosong Qiao
Daimeng Wei
Hengchao Shang
...
Yinglu Li
Yan Yu
Min Zhang
Shimin Tao
Hao Yang
160
6
0
11 Jan 2024
E-chat: Emotion-sensitive Spoken Dialogue System with Large Language
  Models
E-chat: Emotion-sensitive Spoken Dialogue System with Large Language ModelsInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2023
Hongfei Xue
Yuhao Liang
Bingshen Mu
Shiliang Zhang
Mengzhe Chen
Qian Chen
Lei Xie
AuLLM
378
24
0
31 Dec 2023
kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
Jiaming Zhou
Shiwan Zhao
Yaqi Liu
Wenjia Zeng
Yong Chen
Yong Qin
306
17
0
21 Dec 2023
U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword
  Bias
U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword BiasAutomatic Speech Recognition & Understanding (ASRU), 2023
Aoting Zhang
Pan Zhou
Kaixun Huang
Yong Zou
Ming Liu
Lei Xie
198
8
0
15 Dec 2023
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
317
588
0
14 Nov 2023
RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech
  Recognition in Multi-Channel Multi-Speaker Scenarios
RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech Recognition in Multi-Channel Multi-Speaker ScenariosInterspeech (Interspeech), 2023
Yiwen Shao
Shi-Xiong Zhang
Dong Yu
400
1
0
31 Oct 2023
Deep Audio Analyzer: a Framework to Industrialize the Research on Audio
  Forensics
Deep Audio Analyzer: a Framework to Industrialize the Research on Audio Forensics
Valerio Francesco Puglisi
O. Giudice
Sebastiano Battiato
193
1
0
29 Oct 2023
CDSD: Chinese Dysarthria Speech Database
CDSD: Chinese Dysarthria Speech DatabaseInterspeech (Interspeech), 2023
Mengyi Sun
Ming Gao
Xinchen Kang
Shiru Wang
Jun Du
Dengfeng Yao
Su-Jing Wang
377
7
0
24 Oct 2023
Key Frame Mechanism For Efficient Conformer Based End-to-end Speech
  Recognition
Key Frame Mechanism For Efficient Conformer Based End-to-end Speech RecognitionIEEE Signal Processing Letters (IEEE SPL), 2023
Peng Fan
Changhao Shan
Sining Sun
Qing Yang
Jianwei Zhang
234
4
0
23 Oct 2023
Generative error correction for code-switching speech recognition using
  large language models
Generative error correction for code-switching speech recognition using large language models
Chen Chen
Yuchen Hu
Chao-Han Huck Yang
Hexin Liu
Sabato Marco Siniscalchi
Chng Eng Siong
181
10
0
17 Oct 2023
Zipformer: A faster and better encoder for automatic speech recognition
Zipformer: A faster and better encoder for automatic speech recognitionInternational Conference on Learning Representations (ICLR), 2023
Zengwei Yao
Liyong Guo
Xiaoyu Yang
Wei Kang
Fangjun Kuang
Yifan Yang
Zengrui Jin
Long Lin
Daniel Povey
VLM
413
131
0
17 Oct 2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Zhihao Du
Jiaming Wang
Qian Chen
Yunfei Chu
Zhifu Gao
...
Wen Wang
Siqi Zheng
Chang Zhou
Zhijie Yan
Shiliang Zhang
LLMAGVLMAuLLMLM&MA
444
101
0
07 Oct 2023
Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention
  for CTC-based ASR
Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-based ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Ambar Pal
Jeremias Sulam
Yu Tsao
Rene Vidal
149
4
0
28 Sep 2023
Exploring Speech Recognition, Translation, and Understanding with
  Discrete Speech Units: A Comparative Study
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative StudyIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xuankai Chang
Brian Yan
Kwanghee Choi
Jee-weon Jung
Yichen Lu
...
Pengcheng Guo
Yao-Fei Cheng
Pavel Denisov
Kohei Saijo
Hsiu-Hsuan Wang
286
59
0
27 Sep 2023
HyPoradise: An Open Baseline for Generative Speech Recognition with
  Large Language Models
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Cheng Chen
Yuchen Hu
Chao-Han Huck Yang
Sabato Marco Siniscalchi
Pin-Yu Chen
Eng Siong Chng
212
61
0
27 Sep 2023
Speech collage: code-switched audio generation by collaging monolingual
  corpora
Speech collage: code-switched audio generation by collaging monolingual corporaIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
A. Hussein
Dorsa Zeinali
Ondˇrej Klejch
Sanjeev Khudanpur
Brian Yan
Shammur A. Chowdhury
Ahmed M. Ali
Shinji Watanabe
Sanjeev Khudanpur
204
10
0
27 Sep 2023
Segment-Level Vectorized Beam Search Based on Partially Autoregressive
  Inference
Segment-Level Vectorized Beam Search Based on Partially Autoregressive InferenceAutomatic Speech Recognition & Understanding (ASRU), 2023
Masao Someki
N. Eng
Yosuke Higuchi
Shinji Watanabe
283
1
0
26 Sep 2023
Exploring RWKV for Memory Efficient and Low Latency Streaming ASR
Exploring RWKV for Memory Efficient and Low Latency Streaming ASR
Keyu An
Shiliang Zhang
305
6
0
26 Sep 2023
Reproducing Whisper-Style Training Using an Open-Source Toolkit and
  Publicly Available Data
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available DataAutomatic Speech Recognition & Understanding (ASRU), 2023
Yifan Peng
Jinchuan Tian
Brian Yan
Dan Berrebbi
Xuankai Chang
...
Yui Sudo
Muhammad Shakeel
Jee-weon Jung
Soumi Maiti
Shinji Watanabe
VLM
347
60
0
25 Sep 2023
Cross-modal Alignment with Optimal Transport for CTC-based ASR
Cross-modal Alignment with Optimal Transport for CTC-based ASRAutomatic Speech Recognition & Understanding (ASRU), 2023
Xugang Lu
Peng Shen
Yu Tsao
Hisashi Kawai
264
8
0
24 Sep 2023
CoMFLP: Correlation Measure based Fast Search on ASR Layer Pruning
CoMFLP: Correlation Measure based Fast Search on ASR Layer PruningInterspeech (Interspeech), 2023
W. Liu
Zhiyuan Peng
Tan Lee
189
2
0
21 Sep 2023
Sparsely Shared LoRA on Whisper for Child Speech Recognition
Sparsely Shared LoRA on Whisper for Child Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
W. Liu
Ying Qin
Zhiyuan Peng
Tan Lee
268
30
0
21 Sep 2023
HypR: A comprehensive study for ASR hypothesis revising with a reference
  corpus
HypR: A comprehensive study for ASR hypothesis revising with a reference corpusInterspeech (Interspeech), 2023
Yi-Wei Wang
Keda Lu
Kuan-Yu Chen
295
4
0
18 Sep 2023
A Multitask Training Approach to Enhance Whisper with Contextual Biasing
  and Open-Vocabulary Keyword Spotting
A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword SpottingInterspeech (Interspeech), 2023
Yuang Li
Min Zhang
Yan Yu
Yinglu Li
Xiaosong Qiao
Mengxin Ren
Miaomiao Ma
Daimeng Wei
Shimin Tao
Hao Yang
218
9
0
18 Sep 2023
Improved Factorized Neural Transducer Model For text-only Domain
  Adaptation
Improved Factorized Neural Transducer Model For text-only Domain AdaptationInterspeech (Interspeech), 2023
Jing Liu
Jianwei Yu
Xie Chen
326
2
0
18 Sep 2023
Unimodal Aggregation for CTC-based Speech Recognition
Unimodal Aggregation for CTC-based Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Ying Fang
Xiaofei Li
240
4
0
15 Sep 2023
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit
  for Neural Speech Codec
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech CodecIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zhihao Du
Shiliang Zhang
Kai Hu
Siqi Zheng
248
95
0
14 Sep 2023
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTSIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yifan Yang
Feiyu Shen
Chenpeng Du
Ziyang Ma
K. Yu
Daniel Povey
Xie Chen
214
40
0
14 Sep 2023
Text-Only Domain Adaptation for End-to-End Speech Recognition through
  Down-Sampling Acoustic Representation
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic RepresentationInterspeech (Interspeech), 2023
Jiaxu Zhu
Weinan Tong
Yaoxun Xu
Chang Song
Zhiyong Wu
Zhao You
Jane Polak Scowcroft
Dong Yu
Helen M. Meng
164
0
0
04 Sep 2023
SememeASR: Boosting Performance of End-to-End Speech Recognition against
  Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic KnowledgeInterspeech (Interspeech), 2023
Jiaxu Zhu
Chang Song
Zhiyong Wu
Helen Meng
VLM
242
0
0
04 Sep 2023
Timbre-reserved Adversarial Attack in Speaker Identification
Timbre-reserved Adversarial Attack in Speaker IdentificationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Qing Wang
Jixun Yao
Li Zhang
Pengcheng Guo
Linfu Xie
AAML
240
5
0
02 Sep 2023
Dual-path Transformer Based Neural Beamformer for Target Speech
  Extraction
Dual-path Transformer Based Neural Beamformer for Target Speech Extraction
Aoqi Guo
Sichong Qian
Baoxiang Li
Dazhi Gao
223
3
0
30 Aug 2023
LLaSM: Large Language and Speech Model
LLaSM: Large Language and Speech Model
Yu Shu
Siwei Dong
Guangyao Chen
Wen-Fen Huang
Ruihua Zhang
Daochen Shi
Qiqi Xiang
Yemin Shi
AuLLM
294
59
0
30 Aug 2023
VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing
  Personalized TTS Systems for the Speech Impaired
VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing Personalized TTS Systems for the Speech ImpairedOriental COCOSDA International Conference on Speech Database and Assessments (COCOSDA), 2023
Jia-Jyu Su
Pang-Chen Liao
Yen-Ting Lin
Wu-Hao Li
Guan-Ting Liou
...
Wei-Cheng Chen
Jen-Chieh Chiang
Wen-Yang Chang
Pin-Han Lin
Chen-Yu Chiang
106
3
0
27 Aug 2023
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Bayes Risk Transducer: Transducer with Controllable Alignment PredictionInterspeech (Interspeech), 2023
Jinchuan Tian
Jianwei Yu
Hangting Chen
Brian Yan
Chao Weng
Dong Yu
Shinji Watanabe
182
1
0
19 Aug 2023
Improving CTC-AED model with integrated-CTC and auxiliary loss
  regularization
Improving CTC-AED model with integrated-CTC and auxiliary loss regularization
Daobin Zhu
Xiangdong Su
Hongbin Zhang
217
2
0
15 Aug 2023
Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech
  Recognition
Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech RecognitionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Hanjing Zhu
Dongji Gao
Gaofeng Cheng
Daniel Povey
Pengyuan Zhang
Yonghong Yan
NoLa
243
11
0
12 Aug 2023
SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and
  Effective Hotword Customization Ability
SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization AbilityIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xian Shi
Yexin Yang
Zerui Li
Yanni Chen
Zhifu Gao
Shiliang Zhang
259
21
0
07 Aug 2023
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated
  Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
Fangyuan Wang
Ming Hao
Yuhai Shi
Bo Xu
MoMe
154
0
0
05 Aug 2023
CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech
  Recognition
CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Tian-Hao Zhang
Dinghao Zhou
Guiping Zhong
Jiaming Zhou
Baoxiang Li
248
7
0
26 Jul 2023
OxfordVGG Submission to the EGO4D AV Transcription Challenge
OxfordVGG Submission to the EGO4D AV Transcription Challenge
Jaesung Huh
Max Bain
Andrew Zisserman
94
0
0
18 Jul 2023
Previous
12345...8910
Next