Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1709.05522
Cited By
AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline
16 September 2017
Hui Bu
Jiayu Du
Xingyu Na
Bengu Wu
Hao Zheng
CVBM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline"
50 / 451 papers shown
HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling
Chunhui Wang
Chang Zeng
Bowen Zhang
Ziyang Ma
Yefan Zhu
Zifeng Cai
Jian Zhao
Zhonglin Jiang
Yong Chen
SyDa
130
8
0
09 Mar 2024
An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement
Tzu-Ting Yang
Hsin-Wei Wang
Yi-Cheng Wang
Chi-Han Lin
Berlin Chen
217
11
0
27 Feb 2024
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Yifan Peng
Yui Sudo
Muhammad Shakeel
Shinji Watanabe
VLM
339
35
0
20 Feb 2024
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
466
26
0
19 Feb 2024
UniEnc-CASSNAT: An Encoder-only Non-autoregressive ASR for Speech SSL Models
Ruchao Fan
Natarajan Balaji Shankar
Abeer Alwan
244
2
0
14 Feb 2024
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Yifan Peng
Jinchuan Tian
William Chen
Siddhant Arora
Brian Yan
...
Kwanghee Choi
Jiatong Shi
Xuankai Chang
Jee-weon Jung
Shinji Watanabe
VLM
OSLM
304
88
0
30 Jan 2024
Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording Privilege
IEEE Transactions on Dependable and Secure Computing (IEEE TDSC), 2024
Peng Huang
Yao Wei
Jun Zhou
Zhongjie Ba
Liwang Lu
Feng Lin
Yang Wang
Kui Ren
182
1
0
28 Jan 2024
Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline
Seonmin Koo
Chanjun Park
Jinsung Kim
Jaehyung Seo
Sugyeong Eo
Hyeonseok Moon
Heu-Jeoung Lim
179
4
0
26 Jan 2024
Using Large Language Model for End-to-End Chinese ASR and NER
Interspeech (Interspeech), 2024
Yuang Li
Jiawei Yu
Min Zhang
Mengxin Ren
Yanqing Zhao
Xiaofeng Zhao
Miaomiao Ma
Yan Yu
Hao Yang
315
13
0
21 Jan 2024
UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jiaxin Guo
Minghan Wang
Xiaosong Qiao
Daimeng Wei
Hengchao Shang
...
Yinglu Li
Yan Yu
Min Zhang
Shimin Tao
Hao Yang
160
6
0
11 Jan 2024
E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models
International Symposium on Chinese Spoken Language Processing (ISCSLP), 2023
Hongfei Xue
Yuhao Liang
Bingshen Mu
Shiliang Zhang
Mengzhe Chen
Qian Chen
Lei Xie
AuLLM
378
24
0
31 Dec 2023
kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
Jiaming Zhou
Shiwan Zhao
Yaqi Liu
Wenjia Zeng
Yong Chen
Yong Qin
306
17
0
21 Dec 2023
U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword Bias
Automatic Speech Recognition & Understanding (ASRU), 2023
Aoting Zhang
Pan Zhou
Kaixun Huang
Yong Zou
Ming Liu
Lei Xie
198
8
0
15 Dec 2023
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
317
588
0
14 Nov 2023
RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech Recognition in Multi-Channel Multi-Speaker Scenarios
Interspeech (Interspeech), 2023
Yiwen Shao
Shi-Xiong Zhang
Dong Yu
400
1
0
31 Oct 2023
Deep Audio Analyzer: a Framework to Industrialize the Research on Audio Forensics
Valerio Francesco Puglisi
O. Giudice
Sebastiano Battiato
193
1
0
29 Oct 2023
CDSD: Chinese Dysarthria Speech Database
Interspeech (Interspeech), 2023
Mengyi Sun
Ming Gao
Xinchen Kang
Shiru Wang
Jun Du
Dengfeng Yao
Su-Jing Wang
377
7
0
24 Oct 2023
Key Frame Mechanism For Efficient Conformer Based End-to-end Speech Recognition
IEEE Signal Processing Letters (IEEE SPL), 2023
Peng Fan
Changhao Shan
Sining Sun
Qing Yang
Jianwei Zhang
234
4
0
23 Oct 2023
Generative error correction for code-switching speech recognition using large language models
Chen Chen
Yuchen Hu
Chao-Han Huck Yang
Hexin Liu
Sabato Marco Siniscalchi
Chng Eng Siong
181
10
0
17 Oct 2023
Zipformer: A faster and better encoder for automatic speech recognition
International Conference on Learning Representations (ICLR), 2023
Zengwei Yao
Liyong Guo
Xiaoyu Yang
Wei Kang
Fangjun Kuang
Yifan Yang
Zengrui Jin
Long Lin
Daniel Povey
VLM
413
131
0
17 Oct 2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Zhihao Du
Jiaming Wang
Qian Chen
Yunfei Chu
Zhifu Gao
...
Wen Wang
Siqi Zheng
Chang Zhou
Zhijie Yan
Shiliang Zhang
LLMAG
VLM
AuLLM
LM&MA
444
101
0
07 Oct 2023
Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-based ASR
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Ambar Pal
Jeremias Sulam
Yu Tsao
Rene Vidal
149
4
0
28 Sep 2023
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xuankai Chang
Brian Yan
Kwanghee Choi
Jee-weon Jung
Yichen Lu
...
Pengcheng Guo
Yao-Fei Cheng
Pavel Denisov
Kohei Saijo
Hsiu-Hsuan Wang
286
59
0
27 Sep 2023
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models
Neural Information Processing Systems (NeurIPS), 2023
Cheng Chen
Yuchen Hu
Chao-Han Huck Yang
Sabato Marco Siniscalchi
Pin-Yu Chen
Eng Siong Chng
212
61
0
27 Sep 2023
Speech collage: code-switched audio generation by collaging monolingual corpora
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
A. Hussein
Dorsa Zeinali
Ondˇrej Klejch
Sanjeev Khudanpur
Brian Yan
Shammur A. Chowdhury
Ahmed M. Ali
Shinji Watanabe
Sanjeev Khudanpur
204
10
0
27 Sep 2023
Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference
Automatic Speech Recognition & Understanding (ASRU), 2023
Masao Someki
N. Eng
Yosuke Higuchi
Shinji Watanabe
283
1
0
26 Sep 2023
Exploring RWKV for Memory Efficient and Low Latency Streaming ASR
Keyu An
Shiliang Zhang
305
6
0
26 Sep 2023
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Automatic Speech Recognition & Understanding (ASRU), 2023
Yifan Peng
Jinchuan Tian
Brian Yan
Dan Berrebbi
Xuankai Chang
...
Yui Sudo
Muhammad Shakeel
Jee-weon Jung
Soumi Maiti
Shinji Watanabe
VLM
347
60
0
25 Sep 2023
Cross-modal Alignment with Optimal Transport for CTC-based ASR
Automatic Speech Recognition & Understanding (ASRU), 2023
Xugang Lu
Peng Shen
Yu Tsao
Hisashi Kawai
264
8
0
24 Sep 2023
CoMFLP: Correlation Measure based Fast Search on ASR Layer Pruning
Interspeech (Interspeech), 2023
W. Liu
Zhiyuan Peng
Tan Lee
189
2
0
21 Sep 2023
Sparsely Shared LoRA on Whisper for Child Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
W. Liu
Ying Qin
Zhiyuan Peng
Tan Lee
268
30
0
21 Sep 2023
HypR: A comprehensive study for ASR hypothesis revising with a reference corpus
Interspeech (Interspeech), 2023
Yi-Wei Wang
Keda Lu
Kuan-Yu Chen
295
4
0
18 Sep 2023
A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting
Interspeech (Interspeech), 2023
Yuang Li
Min Zhang
Yan Yu
Yinglu Li
Xiaosong Qiao
Mengxin Ren
Miaomiao Ma
Daimeng Wei
Shimin Tao
Hao Yang
218
9
0
18 Sep 2023
Improved Factorized Neural Transducer Model For text-only Domain Adaptation
Interspeech (Interspeech), 2023
Jing Liu
Jianwei Yu
Xie Chen
326
2
0
18 Sep 2023
Unimodal Aggregation for CTC-based Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Ying Fang
Xiaofei Li
240
4
0
15 Sep 2023
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zhihao Du
Shiliang Zhang
Kai Hu
Siqi Zheng
248
95
0
14 Sep 2023
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yifan Yang
Feiyu Shen
Chenpeng Du
Ziyang Ma
K. Yu
Daniel Povey
Xie Chen
214
40
0
14 Sep 2023
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Interspeech (Interspeech), 2023
Jiaxu Zhu
Weinan Tong
Yaoxun Xu
Chang Song
Zhiyong Wu
Zhao You
Jane Polak Scowcroft
Dong Yu
Helen M. Meng
164
0
0
04 Sep 2023
SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Interspeech (Interspeech), 2023
Jiaxu Zhu
Chang Song
Zhiyong Wu
Helen Meng
VLM
242
0
0
04 Sep 2023
Timbre-reserved Adversarial Attack in Speaker Identification
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Qing Wang
Jixun Yao
Li Zhang
Pengcheng Guo
Linfu Xie
AAML
240
5
0
02 Sep 2023
Dual-path Transformer Based Neural Beamformer for Target Speech Extraction
Aoqi Guo
Sichong Qian
Baoxiang Li
Dazhi Gao
223
3
0
30 Aug 2023
LLaSM: Large Language and Speech Model
Yu Shu
Siwei Dong
Guangyao Chen
Wen-Fen Huang
Ruihua Zhang
Daochen Shi
Qiqi Xiang
Yemin Shi
AuLLM
294
59
0
30 Aug 2023
VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing Personalized TTS Systems for the Speech Impaired
Oriental COCOSDA International Conference on Speech Database and Assessments (COCOSDA), 2023
Jia-Jyu Su
Pang-Chen Liao
Yen-Ting Lin
Wu-Hao Li
Guan-Ting Liou
...
Wei-Cheng Chen
Jen-Chieh Chiang
Wen-Yang Chang
Pin-Han Lin
Chen-Yu Chiang
106
3
0
27 Aug 2023
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Interspeech (Interspeech), 2023
Jinchuan Tian
Jianwei Yu
Hangting Chen
Brian Yan
Chao Weng
Dong Yu
Shinji Watanabe
182
1
0
19 Aug 2023
Improving CTC-AED model with integrated-CTC and auxiliary loss regularization
Daobin Zhu
Xiangdong Su
Hongbin Zhang
217
2
0
15 Aug 2023
Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Hanjing Zhu
Dongji Gao
Gaofeng Cheng
Daniel Povey
Pengyuan Zhang
Yonghong Yan
NoLa
243
11
0
12 Aug 2023
SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xian Shi
Yexin Yang
Zerui Li
Yanni Chen
Zhifu Gao
Shiliang Zhang
259
21
0
07 Aug 2023
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
Fangyuan Wang
Ming Hao
Yuhai Shi
Bo Xu
MoMe
154
0
0
05 Aug 2023
CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Tian-Hao Zhang
Dinghao Zhou
Guiping Zhong
Jiaming Zhou
Baoxiang Li
248
7
0
26 Jul 2023
OxfordVGG Submission to the EGO4D AV Transcription Challenge
Jaesung Huh
Max Bain
Andrew Zisserman
94
0
0
18 Jul 2023
Previous
1
2
3
4
5
...
8
9
10
Next