AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline

16 September 2017

Hui Bu

Papers citing "AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline"

50 / 451 papers shown

HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling

Chunhui Wang

Chang Zeng

Jian Zhao

Yong Chen

130

09 Mar 2024

An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement

217

27 Feb 2024

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

Shinji Watanabe

339

20 Feb 2024

Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?

466

19 Feb 2024

UniEnc-CASSNAT: An Encoder-only Non-autoregressive ASR for Speech SSL Models

Ruchao Fan

Natarajan Balaji Shankar

Abeer Alwan

244

14 Feb 2024

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

...

Jiatong Shi

Shinji Watanabe

304

30 Jan 2024

Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording PrivilegeIEEE Transactions on Dependable and Secure Computing (IEEE TDSC), 2024

182

28 Jan 2024

Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline

179

26 Jan 2024

Using Large Language Model for End-to-End Chinese ASR and NERInterspeech (Interspeech), 2024

Min Zhang

315

21 Jan 2024

UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error CorrectionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Jiaxin Guo

...

Min Zhang

160

11 Jan 2024

E-chat: Emotion-sensitive Spoken Dialogue System with Large Language ModelsInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2023

Qian Chen

Lei Xie

AuLLM

378

31 Dec 2023

kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels

Jiaming Zhou

Yong Qin

306

21 Dec 2023

U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword BiasAutomatic Speech Recognition & Understanding (ASRU), 2023

Lei Xie

198

15 Dec 2023

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Yunfei Chu

Jin Xu

Xiaohuan Zhou

Qian Yang

Shiliang Zhang

Zhijie Yan

Chang Zhou

Jingren Zhou

AuLLM

317

588

14 Nov 2023

RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech Recognition in Multi-Channel Multi-Speaker ScenariosInterspeech (Interspeech), 2023

Yiwen Shao

Shi-Xiong Zhang

Dong Yu

400

31 Oct 2023

Deep Audio Analyzer: a Framework to Industrialize the Research on Audio Forensics

Valerio Francesco Puglisi

O. Giudice

Sebastiano Battiato

193

29 Oct 2023

CDSD: Chinese Dysarthria Speech DatabaseInterspeech (Interspeech), 2023

377

24 Oct 2023

Key Frame Mechanism For Efficient Conformer Based End-to-end Speech RecognitionIEEE Signal Processing Letters (IEEE SPL), 2023

234

23 Oct 2023

Generative error correction for code-switching speech recognition using large language models

Chen Chen

Yuchen Hu

Chao-Han Huck Yang

Hexin Liu

Sabato Marco Siniscalchi

Chng Eng Siong

181

17 Oct 2023

Zipformer: A faster and better encoder for automatic speech recognitionInternational Conference on Learning Representations (ICLR), 2023

Zengwei Yao

Liyong Guo

Xiaoyu Yang

Wei Kang

Fangjun Kuang

Yifan Yang

Zengrui Jin

Long Lin

Daniel Povey

VLM

413

131

17 Oct 2023

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

Zhihao Du

Jiaming Wang

Qian Chen

Yunfei Chu

Zhifu Gao

...

Wen Wang

Siqi Zheng

Chang Zhou

Zhijie Yan

Shiliang Zhang

LLMAG VLM AuLLM LM&MA

444

101

07 Oct 2023

Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-based ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Ambar Pal

Jeremias Sulam

Yu Tsao

Rene Vidal

149

28 Sep 2023

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative StudyIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

...

Yao-Fei Cheng

286

27 Sep 2023

HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language ModelsNeural Information Processing Systems (NeurIPS), 2023

Cheng Chen

Yuchen Hu

Chao-Han Huck Yang

Sabato Marco Siniscalchi

Pin-Yu Chen

Eng Siong Chng

212

27 Sep 2023

Speech collage: code-switched audio generation by collaging monolingual corporaIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Sanjeev Khudanpur

204

27 Sep 2023

Segment-Level Vectorized Beam Search Based on Partially Autoregressive InferenceAutomatic Speech Recognition & Understanding (ASRU), 2023

283

26 Sep 2023

Exploring RWKV for Memory Efficient and Low Latency Streaming ASR

Keyu An

Shiliang Zhang

305

26 Sep 2023

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available DataAutomatic Speech Recognition & Understanding (ASRU), 2023

...

347

25 Sep 2023

Cross-modal Alignment with Optimal Transport for CTC-based ASRAutomatic Speech Recognition & Understanding (ASRU), 2023

Xugang Lu

Peng Shen

Yu Tsao

Hisashi Kawai

264

24 Sep 2023

CoMFLP: Correlation Measure based Fast Search on ASR Layer PruningInterspeech (Interspeech), 2023

W. Liu

Zhiyuan Peng

Tan Lee

189

21 Sep 2023

Sparsely Shared LoRA on Whisper for Child Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

268

21 Sep 2023

HypR: A comprehensive study for ASR hypothesis revising with a reference corpusInterspeech (Interspeech), 2023

Yi-Wei Wang

Keda Lu

Kuan-Yu Chen

295

18 Sep 2023

A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword SpottingInterspeech (Interspeech), 2023

Min Zhang

218

18 Sep 2023

Improved Factorized Neural Transducer Model For text-only Domain AdaptationInterspeech (Interspeech), 2023

Jing Liu

Jianwei Yu

Xie Chen

326

18 Sep 2023

Unimodal Aggregation for CTC-based Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Ying Fang

Xiaofei Li

240

15 Sep 2023

FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech CodecIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Zhihao Du

Shiliang Zhang

Kai Hu

Siqi Zheng

248

14 Sep 2023

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTSIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Yifan Yang

Chenpeng Du

Xie Chen

214

14 Sep 2023

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic RepresentationInterspeech (Interspeech), 2023

Zhiyong Wu

164

04 Sep 2023

SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic KnowledgeInterspeech (Interspeech), 2023

Zhiyong Wu

242

04 Sep 2023

Timbre-reserved Adversarial Attack in Speaker IdentificationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

240

02 Sep 2023

Dual-path Transformer Based Neural Beamformer for Target Speech Extraction

223

30 Aug 2023

LLaSM: Large Language and Speech Model

294

30 Aug 2023

VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing Personalized TTS Systems for the Speech ImpairedOriental COCOSDA International Conference on Speech Database and Assessments (COCOSDA), 2023

Jia-Jyu Su

...

106

27 Aug 2023

Bayes Risk Transducer: Transducer with Controllable Alignment PredictionInterspeech (Interspeech), 2023

Dong Yu

Shinji Watanabe

182

19 Aug 2023

Improving CTC-AED model with integrated-CTC and auxiliary loss regularization

Daobin Zhu

Xiangdong Su

Hongbin Zhang

217

15 Aug 2023

Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech RecognitionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Pengyuan Zhang

243

12 Aug 2023

SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization AbilityIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

259

07 Aug 2023

ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging

154

05 Aug 2023

CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Jiaming Zhou

248

26 Jul 2023

OxfordVGG Submission to the EGO4D AV Transcription Challenge

Jaesung Huh

Max Bain

Andrew Zisserman

18 Jul 2023