AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline

16 September 2017

Hui Bu

Papers citing "AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline"

50 / 451 papers shown

TST: Time-Sparse Transducer for Automatic Speech RecognitionCAAI International Conference on Artificial Intelligence (ICCAI), 2023

Jiangyan Yi

113

17 Jul 2023

Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical StudyInternational Conference on Neural Information Processing (ICONIP), 2023

Zeping Min

Jinbo Wang

AuLLM

197

13 Jul 2023

SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and UnderstandingInterspeech (Interspeech), 2023

Titouan Parcollet

Rogier van Dalen

Shucong Zhang

S. Bhattacharya

234

12 Jul 2023

Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech RecognitionInterspeech (Interspeech), 2023

263

12 Jul 2023

Enrollment-stage Backdoor Attacks on Speaker Recognition Systems via Adversarial UltrasoundIEEE Internet of Things Journal (IEEE IoT J.), 2023

Xinfeng Li

Xiaoyu Ji

196

28 Jun 2023

A Survey on Multimodal Large Language ModelsNational Science Review (NSR), 2023

Enhong Chen

458

995

23 Jun 2023

Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech RecognitionInterspeech (Interspeech), 2023

189

20 Jun 2023

Research on an improved Conformer end-to-end Speech Recognition Model with R-Drop Structure

186

14 Jun 2023

MAVD: The First Open Large-Scale Mandarin Audio-Visual Dataset with Depth InformationInterspeech (Interspeech), 2023

155

04 Jun 2023

Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned PoolingInterspeech (Interspeech), 2023

Ramon Sanabria

Ondˇrej Klejch

Hao Tang

Sharon Goldwater

138

03 Jun 2023

Enhancing the Unified Streaming and Non-streaming Model with Contrastive LearningInterspeech (Interspeech), 2023

162

01 Jun 2023

Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed SpeechInterspeech (Interspeech), 2023

Shashi Kant Gupta

Sushant Hiray

Prashant Kukde

190

01 Jun 2023

VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Minglun Han

Bo Xu

186

31 May 2023

Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning

124

31 May 2023

Perception and Semantic Aware Regularization for Sequential Confidence CalibrationComputer Vision and Pattern Recognition (CVPR), 2023

Shuangping Huang

289

31 May 2023

Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker IdentificationInterspeech (Interspeech), 2023

Ziqian Wang

149

30 May 2023

Investigating model performance in language identification: beyond simple error statisticsInterspeech (Interspeech), 2023

Leibny Paola García Perera

Sanjeev Khudanpur

Andy W. H. Khong

Justin Dauwels

127

30 May 2023

Speaker anonymization using orthogonal Householder neural networkIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Xin Wang

132

30 May 2023

speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Longbiao Wang

179

29 May 2023

Bridging the Granularity Gap for Acoustic ModelingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

JingBo Zhu

247

27 May 2023

DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distributionConference on Uncertainty in Artificial Intelligence (UAI), 2023

498

26 May 2023

InterFormer: Interactive Local and Global Features Fusion for Automatic Speech RecognitionInterspeech (Interspeech), 2023

Xinyuan Qian

124

24 May 2023

Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative DecodingInterspeech (Interspeech), 2023

Xinyuan Qian

119

23 May 2023

ADD 2023: the Second Audio Deepfake Detection Challenge

Jiangyan Yi

...

Haizhou Li

280

147

23 May 2023

CopyNE: Better Contextual ASR by Copying Named EntitiesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

275

22 May 2023

GNCformer Enhanced Self-attention for Automatic Speech Recognition

141

22 May 2023

Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech RecognitionInterspeech (Interspeech), 2023

Zhijian Ou

210

22 May 2023

A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding TasksInterspeech (Interspeech), 2023

Kwangyoun Kim

236

18 May 2023

FunASR: A Fundamental End-to-End Speech Recognition ToolkitInterspeech (Interspeech), 2023

...

246

110

18 May 2023

A Lexical-aware Non-autoregressive Transformer-based ASR ModelInterspeech (Interspeech), 2023

Chong Lin

Kuan-Yu Chen

AI4TS

122

18 May 2023

ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMsInterspeech (Interspeech), 2023

Binbin Zhang

Zhiyong Wu

172

18 May 2023

X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

Minglun Han

Bo Xu

336

150

07 May 2023

Self-regularised Minimum Latency Training for Streaming Transformer-based Speech RecognitionInterspeech (Interspeech), 2022

Mohan Li

R. Doddipatla

Catalin Zorila

228

24 Apr 2023

A CTC Alignment-based Non-autoregressive Transformer for End-to-end Automatic Speech RecognitionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

173

15 Apr 2023

Sim-T: Simplify the Transformer Network by Multiplexing Technique for Speech Recognition

198

11 Apr 2023

TransAudio: Towards the Transferable Adversarial Audio Attack via Learning Contextualized PerturbationsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

222

28 Mar 2023

Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition

210

23 Mar 2023

Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognitionInternational Symposium on Neural Networks (ISNN), 2023

Haoyu Tang

Zhaoyi Liu

Chang Zeng

Xinfeng Li

227

23 Mar 2023

Exploring Representation Learning for Small-Footprint Keyword SpottingInterspeech (Interspeech), 2022

Liyong Guo

Yujun Wang

161

20 Mar 2023

Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition ModelACM Multimedia (MM), 2021

Shuangping Huang

200

13 Mar 2023

The System Description of dun_oscar team for The ICPR MSR Challenge

Binbin Du

Rui Deng

Yingxin Zhang

136

13 Mar 2023

Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical DistillationInterspeech (Interspeech), 2023

Minglun Han

Bo Xu

216

30 Jan 2023

Acoustic correlates of the syllabic rhythm of speech: Modulation spectrum or local features of the temporal envelopeNeuroscience and Biobehavioral Reviews (NBR), 2022

Yuran Zhang

Jiajie Zou

Nai Ding

14 Jan 2023

Learning to Detect Noisy Labels Using Model-Based FeaturesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

238

28 Dec 2022

OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

Jinze Bai

Rui Men

Xuancheng Ren

...

Jianxin Ma

Jingren Zhou

Chang Zhou

144

08 Dec 2022

SoftCorrect: Error Correction with Soft Detection for Automatic Speech RecognitionAAAI Conference on Artificial Intelligence (AAAI), 2022

Xu Tan

Xiang-Yang Li

243

02 Dec 2022

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech RecognitionInterspeech (Interspeech), 2022

Xiaohuan Zhou

Jiaming Wang

Zeyu Cui

Shiliang Zhang

Zhijie Yan

Jingren Zhou

Chang Zhou

233

29 Nov 2022

Model Extraction Attack against Self-supervised Speech Models

189

29 Nov 2022

A new Speech Feature Fusion method with cross gate parallel CNN for Speaker Recognition

Jiacheng Zhang

Wenyi Yan

Ye Zhang

24 Nov 2022

Mask the Correct Tokens: An Embarrassingly Simple Approach for Error CorrectionConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

132

23 Nov 2022