ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1709.05522
  4. Cited By
AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech
  Recognition Baseline

AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline

16 September 2017
Hui Bu
Jiayu Du
Xingyu Na
Bengu Wu
Hao Zheng
    CVBM
ArXiv (abs)PDFHTML

Papers citing "AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline"

50 / 451 papers shown
SSCFormer: Push the Limit of Chunk-wise Conformer for Streaming ASR
  Using Sequentially Sampled Chunks and Chunked Causal Convolution
SSCFormer: Push the Limit of Chunk-wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal ConvolutionIEEE Signal Processing Letters (SPL), 2022
Fangyuan Wang
Bo Xu
Bo Xu
326
0
0
21 Nov 2022
Improving Noisy Student Training on Non-target Domain Data for Automatic
  Speech Recognition
Improving Noisy Student Training on Non-target Domain Data for Automatic Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Yu Chen
Wen Ding
Junjie Lai
267
11
0
09 Nov 2022
The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge
  (ICSRC): Dataset, Tracks, Baseline and Results
The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and ResultsInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2022
Ao Zhang
F. Yu
Kaixun Huang
Linfu Xie
Longbiao Wang
Eng Siong Chng
Hui Bu
Binbin Zhang
Wei Chen
Xin Xu
200
5
0
03 Nov 2022
Towards Zero-Shot Code-Switched Speech Recognition
Towards Zero-Shot Code-Switched Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Brian Yan
Sanjeev Khudanpur
Ondˇrej Klejch
Preethi Jyothi
Shinji Watanabe
223
24
0
02 Nov 2022
Monolingual Recognizers Fusion for Code-switching Speech Recognition
Monolingual Recognizers Fusion for Code-switching Speech Recognition
Tongtong Song
Qiang Xu
Haoyu Lu
Longbiao Wang
Hao Shi
Yuqin Lin
Yanbing Yang
Jianwu Dang
149
4
0
02 Nov 2022
BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder
BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced EncoderIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Yosuke Higuchi
Tetsuji Ogawa
Tetsunori Kobayashi
Shinji Watanabe
337
16
0
02 Nov 2022
TrimTail: Low-Latency Streaming ASR with Simple but Effective
  Spectrogram-Level Length Penalty
TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length PenaltyIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Xingcheng Song
Di Wu
Zhiyong Wu
Binbin Zhang
Yuekai Zhang
Zhendong Peng
Wenpeng Li
Fuping Pan
Changbao Zhu
235
12
0
01 Nov 2022
FusionFormer: Fusing Operations in Transformer for Efficient Streaming
  Speech Recognition
FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition
Xingcheng Song
Di Wu
Binbin Zhang
Zhiyong Wu
Wenpeng Li
...
Peng Zhang
Zhendong Peng
Fuping Pan
Changbao Zhu
Zhongqin Wu
133
2
0
31 Oct 2022
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with
  Pre-trained Masked Language Model
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yosuke Higuchi
Brian Yan
Siddhant Arora
Tetsuji Ogawa
Tetsunori Kobayashi
Shinji Watanabe
254
31
0
29 Oct 2022
SAN: a robust end-to-end ASR model architecture
SAN: a robust end-to-end ASR model architectureIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Zeping Min
Qian Ge
Guanhua Huang
123
2
0
27 Oct 2022
V-Cloak: Intelligibility-, Naturalness- & Timbre-Preserving Real-Time
  Voice Anonymization
V-Cloak: Intelligibility-, Naturalness- & Timbre-Preserving Real-Time Voice Anonymization
Jiangyi Deng
Fei Teng
Yanjiao Chen
Xiaofu Chen
Zhaohui Wang
Wenyuan Xu
200
33
0
27 Oct 2022
Pronunciation Generation for Foreign Language Words in Intra-Sentential
  Code-Switching Speech Recognition
Pronunciation Generation for Foreign Language Words in Intra-Sentential Code-Switching Speech Recognition
Wei Wang
Chao Zhang
Xiao-pei Wu
79
0
0
26 Oct 2022
10 hours data is all you need
10 hours data is all you need
Zeping Min
Qian Ge
Zhong Li
174
3
0
24 Oct 2022
spatial-dccrn: dccrn equipped with frame-level angle feature and hybrid
  filtering for multi-channel speech enhancement
spatial-dccrn: dccrn equipped with frame-level angle feature and hybrid filtering for multi-channel speech enhancementSpoken Language Technology Workshop (SLT), 2022
Shubo Lv
Yihui Fu
Yukai Jv
Linfu Xie
Weixin Zhu
Wei Rao
Yannan Wang
152
11
0
17 Oct 2022
Acoustic-aware Non-autoregressive Spell Correction with Mask Sample
  Decoding
Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Ruchao Fan
Guoli Ye
Yashesh Gaur
Jinyu Li
183
4
0
16 Oct 2022
LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition ChallengeInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2022
Yan Jia
Mihee Hong
Jingyu Hou
Kailong Ren
Sifan Ma
Jin Wang
Fangzhen Peng
Yinglin Ji
Lin Yang
Junjie Wang
176
1
0
14 Oct 2022
An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling
  to Differential Privacy Preserving Speech Recognition
An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech RecognitionInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2022
Chao-Han Huck Yang
Jun Qi
Sabato Marco Siniscalchi
Chin-Hui Lee
180
4
0
12 Oct 2022
A context-aware knowledge transferring strategy for CTC-based ASR
A context-aware knowledge transferring strategy for CTC-based ASRSpoken Language Technology Workshop (SLT), 2022
Keda Lu
Kuan-Yu Chen
166
24
0
12 Oct 2022
PSVRF: Learning to restore Pitch-Shifted Voice without reference
Yangfu Li
Xiaodan Lin
Jiaxin Yang
129
0
0
06 Oct 2022
Adaptive Sparse and Monotonic Attention for Transformer-based Automatic
  Speech Recognition
Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech RecognitionInternational Conference on Data Science and Advanced Analytics (DSAA), 2022
Chendong Zhao
Jianzong Wang
Wentao Wei
Xiaoyang Qu
Haoqian Wang
Jing Xiao
170
2
0
30 Sep 2022
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for
  End-to-End Speech Recognition
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech RecognitionInterspeech (Interspeech), 2022
Ye Bai
Jie Li
W. Han
Hao Ni
Kaituo Xu
Zhuo Zhang
Cheng Yi
Xiaorui Wang
MoE
140
3
0
17 Sep 2022
Pronunciation-aware unique character encoding for RNN Transducer-based
  Mandarin speech recognition
Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognitionSpoken Language Technology Workshop (SLT), 2022
Peng Shen
Xugang Lu
Hisashi Kawai
106
2
0
29 Jul 2022
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Xiaoming Ren
Huifeng Zhu
Liuwei Wei
Minghui Wu
Jie Hao
230
12
0
24 Jul 2022
Knowledge Transfer and Distillation from Autoregressive to
  Non-Autoregressive Speech Recognition
Knowledge Transfer and Distillation from Autoregressive to Non-Autoregressive Speech Recognition
Xun Gong
Zhikai Zhou
Y. Qian
225
6
0
15 Jul 2022
Subband-based Generative Adversarial Network for Non-parallel
  Many-to-many Voice Conversion
Subband-based Generative Adversarial Network for Non-parallel Many-to-many Voice Conversion
Jianchun Ma
Zhedong Zheng
Hao Fei
Feng Zheng
Tat-Seng Chua
Yi Yang
GAN
161
0
0
13 Jul 2022
CFAD: A Chinese Dataset for Fake Audio Detection
CFAD: A Chinese Dataset for Fake Audio DetectionSpeech Communication (Speech Commun.), 2022
Haoxin Ma
Jiangyan Yi
Chenglong Wang
Xin Yan
Jianhua Tao
Tao Wang
Shiming Wang
Ruibo Fu
174
51
0
12 Jul 2022
The HCCL System for the NIST SRE21
The HCCL System for the NIST SRE21Interspeech (Interspeech), 2022
Zhuo Li
Runqiu Xiao
Hangting Chen
Zhenduo Zhao
Zi-qiang Zhang
Wenchao Wang
111
0
0
11 Jul 2022
Branchformer: Parallel MLP-Attention Architectures to Capture Local and
  Global Context for Speech Recognition and Understanding
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and UnderstandingInternational Conference on Machine Learning (ICML), 2022
Yifan Peng
Siddharth Dalmia
Ian Lane
Shinji Watanabe
265
191
0
06 Jul 2022
Language-specific Characteristic Assistance for Code-switching Speech
  Recognition
Language-specific Characteristic Assistance for Code-switching Speech RecognitionInterspeech (Interspeech), 2022
Tongtong Song
Qiang Xu
Meng Ge
Longbiao Wang
Hao Shi
Yongjie Lv
Yuqin Lin
Jianwu Dang
194
35
0
29 Jun 2022
TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a
  Speech Recognition Baseline
TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech Recognition BaselineInterspeech (Interspeech), 2022
Chengfei Li
Shuhao Deng
Yaoping Wang
Guangjing Wang
Y. Gong
Changbin Chen
Jinfeng Bai
142
24
0
27 Jun 2022
SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified
  Datasets and Multitask Learning
SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask LearningInterspeech (Interspeech), 2022
Zuheng Kang
Junqing Peng
Jianzong Wang
Jing Xiao
155
7
0
27 Jun 2022
Paraformer: Fast and Accurate Parallel Transformer for
  Non-autoregressive End-to-End Speech Recognition
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech RecognitionInterspeech (Interspeech), 2022
Zhifu Gao
Shiliang Zhang
Ian Mcloughlin
Zhijie Yan
235
179
0
16 Jun 2022
Residual Language Model for End-to-end Speech Recognition
Residual Language Model for End-to-end Speech RecognitionInterspeech (Interspeech), 2022
E. Tsunoo
Yosuke Kashiwagi
Chaitanya Narisetty
Shinji Watanabe
144
11
0
15 Jun 2022
Improving CTC-based ASR Models with Gated Interlayer Collaboration
Improving CTC-based ASR Models with Gated Interlayer CollaborationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Yuting Yang
Yuke Li
Binbin Du
318
15
0
25 May 2022
Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition
Multi-Level Modeling Units for End-to-End Mandarin Speech RecognitionInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2022
Yuting Yang
Binbin Du
Yuke Li
329
2
0
24 May 2022
PERT: A New Solution to Pinyin to Character Conversion Task
PERT: A New Solution to Pinyin to Character Conversion Task
Jinghui Xiao
Qun Liu
Xin Jiang
Yuanfeng Xiong
Haiteng Wu
Zhe Zhang
98
2
0
24 May 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A ReviewIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSLAI4TS
655
442
0
21 May 2022
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
PaddleSpeech: An Easy-to-Use All-in-One Speech ToolkitNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Hui Zhang
Tian Yuan
Junkun Chen
Xintong Li
Renjie Zheng
...
Zeyu Chen
Xiaoguang Hu
Dianhai Yu
Yanjun Ma
Liang Huang
AuLLM
135
31
0
20 May 2022
Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech
  Translation
Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech TranslationInterspeech (Interspeech), 2022
Qianqian Dong
Fengpeng Yue
Tom Ko
Mingxuan Wang
Qibing Bai
Yu Zhang
232
19
0
18 May 2022
One Model, Multiple Modalities: A Sparsely Activated Approach for Text,
  Sound, Image, Video and Code
One Model, Multiple Modalities: A Sparsely Activated Approach for Text, Sound, Image, Video and Code
Yong Dai
Duyu Tang
Liangxin Liu
Minghuan Tan
Cong Zhou
Jingquan Wang
Zhangyin Feng
Fan Zhang
Xueyu Hu
Shuming Shi
VLMMoE
160
32
0
12 May 2022
Heterogeneous Separation Consistency Training for Adaptation of
  Unsupervised Speech Separation
Heterogeneous Separation Consistency Training for Adaptation of Unsupervised Speech SeparationEURASIP Journal on Audio, Speech, and Music Processing (EURASIP J. Audio Speech Music Process.), 2022
Jiangyu Han
Yanhua Long
133
7
0
23 Apr 2022
Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker
  Extraction
Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker ExtractionInterspeech (Interspeech), 2022
Zifeng Zhao
Rongzhi Gu
Dongchao Yang
Jinchuan Tian
Yuexian Zou
132
2
0
15 Apr 2022
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech
  Recognition
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Qianying Liu
Zhuo Gong
Zhengdong Yang
Yuhang Yang
Sheng Li
...
Nobuaki Minematsu
Hao-Ming Huang
Fei Cheng
Chenhui Chu
Sadao Kurohashi
173
10
0
08 Apr 2022
Enhanced exemplar autoencoder with cycle consistency loss in any-to-one
  voice conversion
Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion
Weida Liang
Lantian Li
Wenqiang Du
Dong Wang
280
0
0
08 Apr 2022
Alternate Intermediate Conditioning with Syllable-level and
  Character-level Targets for Japanese ASR
Alternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASRSpoken Language Technology Workshop (SLT), 2022
Yusuke Fujita
Tatsuya Komatsu
Yusuke Kida
210
3
0
01 Apr 2022
Memory-Efficient Training of RNN-Transducer with Sampled Softmax
Memory-Efficient Training of RNN-Transducer with Sampled SoftmaxInterspeech (Interspeech), 2022
Jaesong Lee
Lukas Lee
Shinji Watanabe
285
8
0
31 Mar 2022
Open Source MagicData-RAMC: A Rich Annotated Mandarin
  Conversational(RAMC) Speech Dataset
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech DatasetInterspeech (Interspeech), 2022
Zehui Yang
Yifan Chen
Lei Luo
Runyan Yang
Lingxuan Ye
...
Yaohui Jin
Qingqing Zhang
Pengyuan Zhang
Lei Xie
Yonghong Yan
146
71
0
31 Mar 2022
An Empirical Study of Language Model Integration for Transducer based
  Speech Recognition
An Empirical Study of Language Model Integration for Transducer based Speech RecognitionInterspeech (Interspeech), 2022
Huahuan Zheng
Keyu An
Zhijian Ou
Chen Huang
Ke Ding
Guanglu Wan
193
5
0
31 Mar 2022
CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming
  ASR
CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASRInterspeech (Interspeech), 2022
Keyu An
Huahuan Zheng
Zhijian Ou
Hongyu Xiang
Ke Ding
Guanglu Wan
AI4TS
159
21
0
31 Mar 2022
Exploiting Single-Channel Speech for Multi-Channel End-to-End Speech
  Recognition: A Comparative Study
Exploiting Single-Channel Speech for Multi-Channel End-to-End Speech Recognition: A Comparative StudyInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2022
Keyu An
Ji Xiao
Zhijian Ou
107
3
0
31 Mar 2022
Previous
123...1056789
Next