Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1808.10583
Cited By
v1
v2 (latest)
AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale
31 August 2018
Jiayu Du
Xingyu Na
Xuechen Liu
Hui Bu
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale"
50 / 157 papers shown
Title
Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey
Jiaxing Huang
Jingyi Zhang
Kai Jiang
Han Qiu
Shijian Lu
85
23
0
27 Dec 2023
kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
Jiaming Zhou
Shiwan Zhao
Yaqi Liu
Wenjia Zeng
Yong Chen
Yong Qin
85
10
0
21 Dec 2023
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
139
352
0
14 Nov 2023
CDSD: Chinese Dysarthria Speech Database
Mengyi Sun
Ming Gao
Xinchen Kang
Shiru Wang
Jun Du
Dengfeng Yao
Su-Jing Wang
137
3
0
24 Oct 2023
Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation
T. Park
He Huang
Coleman Hooper
Nithin Rao Koluguri
Kunal Dhawan
Ante Jukić
Jagadeesh Balam
Boris Ginsburg
63
7
0
18 Oct 2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Zhihao Du
Jiaming Wang
Qian Chen
Yunfei Chu
Zhifu Gao
...
Wen Wang
Siqi Zheng
Chang Zhou
Zhijie Yan
Shiliang Zhang
LLMAG
VLM
AuLLM
LM&MA
131
87
0
07 Oct 2023
Neural Network Augmented Kalman Filter for Robust Acoustic Howling Suppression
Yixuan Zhang
Junkai Wang
Mengxue Hou
Dong Yu
57
2
0
27 Sep 2023
Advancing Acoustic Howling Suppression through Recursive Training of Neural Networks
Huatian Zhang
Yixuan Zhang
Meng Yu
Dong Yu
73
3
0
27 Sep 2023
CoMFLP: Correlation Measure based Fast Search on ASR Layer Pruning
W. Liu
Zhiyuan Peng
Tan Lee
52
2
0
21 Sep 2023
Improved Factorized Neural Transducer Model For text-only Domain Adaptation
Jing Liu
Jianwei Yu
Xie Chen
87
1
0
18 Sep 2023
Unimodal Aggregation for CTC-based Speech Recognition
Ying Fang
Xiaofei Li
60
1
0
15 Sep 2023
SpatialCodec: Neural Spatial Speech Coding
Zhongweiyang Xu
Yong-mei Xu
Vinay Kothapally
Heming Wang
Muqiao Yang
Dong Yu
41
1
0
14 Sep 2023
CPPF: A contextual and post-processing-free model for automatic speech recognition
Lei Zhang
Zhengkun Tian
Xiang Chen
Jiaming Sun
Hongyu Xiang
Ke Ding
Guanglu Wan
67
0
0
14 Sep 2023
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Zhihao Du
Shiliang Zhang
Kai Hu
Siqi Zheng
102
63
0
14 Sep 2023
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Jiaxu Zhu
Weinan Tong
Yaoxun Xu
Chang Song
Zhiyong Wu
Zhao You
Dan Su
Dong Yu
Helen M. Meng
63
0
0
04 Sep 2023
SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Jiaxu Zhu
Chang Song
Zhiyong Wu
Helen Meng
VLM
61
0
0
04 Sep 2023
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Jinchuan Tian
Jianwei Yu
Hangting Chen
Brian Yan
Chao Weng
Dong Yu
Shinji Watanabe
88
1
0
19 Aug 2023
Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
Yusheng Dai
Hang Chen
Jun Du
xiao-ying Ding
Ning Ding
Feijun Jiang
Chin-Hui Lee
95
8
0
14 Aug 2023
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
Fangyuan Wang
Ming Hao
Yuhai Shi
Bo Xu
MoMe
59
0
0
05 Aug 2023
Spatial-temporal Graph Based Multi-channel Speaker Verification With Ad-hoc Microphone Arrays
Yijiang Chen
Chen Liang
Xiao-Lei Zhang
47
1
0
03 Jul 2023
Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction
Aoqi Guo
Junnan Wu
Peng Gao
Wenbo Zhu
Qinwen Guo
Dazhi Gao
Yujun Wang
34
1
0
28 Jun 2023
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLM
LRM
135
611
0
23 Jun 2023
MAVD: The First Open Large-Scale Mandarin Audio-Visual Dataset with Depth Information
Jianrong Wang
Yuchen Huo
Li Liu
Tianyi Xu
Qi Li
Sen Li
59
3
0
04 Jun 2023
FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Zhifu Gao
Zerui Li
Jiaming Wang
Haoneng Luo
Xian Shi
...
Yabin Li
Lingyun Zuo
Zhihao Du
Zhangyu Xiao
Shiliang Zhang
81
66
0
18 May 2023
A Lexical-aware Non-autoregressive Transformer-based ASR Model
Chong Lin
Kuan-Yu Chen
AI4TS
52
1
0
18 May 2023
Accented Text-to-Speech Synthesis with Limited Data
Xuehao Zhou
Mingyang Zhang
Yi Zhou
Zhizheng Wu
Haizhou Li
71
15
0
08 May 2023
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Feilong Chen
Minglun Han
Haozhi Zhao
Qingyang Zhang
Jing Shi
Shuang Xu
Bo Xu
MLLM
122
126
0
07 May 2023
Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression
Huatian Zhang
Meng Yu
Yuzhong Wu
Tao Yu
Dong Yu
60
4
0
04 May 2023
Deep Learning for Joint Acoustic Echo and Acoustic Howling Suppression in Hybrid Meetings
Huatian Zhang
Meng Yu
Dong Yu
60
2
0
02 May 2023
Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation
Qi Chen
Ziyang Ma
Tao Liu
Xuejiao Tan
Qu Lu
Xie Chen
K. Yu
CVBM
64
5
0
09 Mar 2023
Deep AHS: A Deep Learning Approach to Acoustic Howling Suppression
Huatian Zhang
Meng Yu
Dong Yu
70
9
0
18 Feb 2023
NeuralKalman: A Learnable Kalman Filter for Acoustic Echo Cancellation
Yixuan Zhang
Meng Yu
Huatian Zhang
Dong Yu
DeLiang Wang
67
7
0
29 Jan 2023
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Xiaohuan Zhou
Jiaming Wang
Zeyu Cui
Shiliang Zhang
Zhijie Yan
Jingren Zhou
Chang Zhou
93
12
0
29 Nov 2022
Deep Neural Mel-Subband Beamformer for In-car Speech Separation
Vinay Kothapally
Yong-mei Xu
Meng Yu
Shizhong Zhang
Dong Yu
65
12
0
22 Nov 2022
Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition
Yu Chen
Wen Ding
Junjie Lai
81
9
0
09 Nov 2022
Waveform Boundary Detection for Partially Spoofed Audio
Zexin Cai
Weiqing Wang
Ming Li
48
27
0
01 Nov 2022
A context-aware knowledge transferring strategy for CTC-based ASR
Keda Lu
Kuan-Yu Chen
56
16
0
12 Oct 2022
The Kriston AI System for the VoxCeleb Speaker Recognition Challenge 2022
Qutang Cai
Guoqiang Hong
Zhijian Ye
Ximin Li
Haizhou Li
119
7
0
23 Sep 2022
FRA-RIR: Fast Random Approximation of the Image-source Method
Yi Luo
Jianwei Yu
56
7
0
08 Aug 2022
Subband-based Generative Adversarial Network for Non-parallel Many-to-many Voice Conversion
Jianchun Ma
Zhedong Zheng
Hao Fei
Feng Zheng
Tat-Seng Chua
Yi Yang
GAN
61
0
0
13 Jul 2022
The HCCL System for the NIST SRE21
Zhuo Li
Runqiu Xiao
Hangting Chen
Zhenduo Zhao
Zi-qiang Zhang
Wenchao Wang
52
0
0
11 Jul 2022
Minimizing Sequential Confusion Error in Speech Command Recognition
Zhanheng Yang
Hang Lv
Xiong Wang
Ao Zhang
Linfu Xie
27
0
0
04 Jul 2022
Language-specific Characteristic Assistance for Code-switching Speech Recognition
Tongtong Song
Qiang Xu
Meng Ge
Longbiao Wang
Hao Shi
Yongjie Lv
Yuqin Lin
Jianwu Dang
72
27
0
29 Jun 2022
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
Zhifu Gao
Shiliang Zhang
Ian Mcloughlin
Zhijie Yan
79
108
0
16 Jun 2022
NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement
Meng Yu
Yong-mei Xu
Chunlei Zhang
Shizhong Zhang
Dong Yu
50
11
0
20 May 2022
Few-Shot Speaker Identification Using Depthwise Separable Convolutional Network with Channel Attention
Yanxiong Li
Wucheng Wang
Hao Chen
Wenchang Cao
Wei Li
Qianhua He
63
5
0
24 Apr 2022
Integrating Lattice-Free MMI into End-to-End Speech Recognition
Jinchuan Tian
Jianwei Yu
Chao Weng
Yuexian Zou
Dong Yu
88
8
0
29 Mar 2022
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
Binbin Zhang
Di Wu
Zhendong Peng
Xingcheng Song
Zhuoyuan Yao
Hang Lv
Linfu Xie
Chao Yang
Fuping Pan
Jianwei Niu
VLM
92
99
0
29 Mar 2022
Improving CTC-based speech recognition via knowledge transferring from pre-trained language models
Keqi Deng
Songjun Cao
Yike Zhang
Long Ma
Gaofeng Cheng
Ji Xu
Pengyuan Zhang
28
27
0
22 Feb 2022
The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge
Maokui He
Xiang Lv
Weilin Zhou
Jingjing Yin
Xiaoqi Zhang
...
Shutong Niu
Yuhang Cao
Heng Lu
Jun Du
Chin-Hui Lee
82
8
0
10 Feb 2022
Previous
1
2
3
4
Next