ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.10583
  4. Cited By
AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale
v1v2 (latest)

AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale

31 August 2018
Jiayu Du
Xingyu Na
Xuechen Liu
Hui Bu
    VLM
ArXiv (abs)PDFHTML

Papers citing "AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale"

50 / 157 papers shown
Title
Visual Instruction Tuning towards General-Purpose Multimodal Model: A
  Survey
Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey
Jiaxing Huang
Jingyi Zhang
Kai Jiang
Han Qiu
Shijian Lu
85
23
0
27 Dec 2023
kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
Jiaming Zhou
Shiwan Zhao
Yaqi Liu
Wenjia Zeng
Yong Chen
Yong Qin
85
10
0
21 Dec 2023
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
139
352
0
14 Nov 2023
CDSD: Chinese Dysarthria Speech Database
CDSD: Chinese Dysarthria Speech Database
Mengyi Sun
Ming Gao
Xinchen Kang
Shiru Wang
Jun Du
Dengfeng Yao
Su-Jing Wang
137
3
0
24 Oct 2023
Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling
  Technique for Synthetic Data Generation
Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation
T. Park
He Huang
Coleman Hooper
Nithin Rao Koluguri
Kunal Dhawan
Ante Jukić
Jagadeesh Balam
Boris Ginsburg
63
7
0
18 Oct 2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Zhihao Du
Jiaming Wang
Qian Chen
Yunfei Chu
Zhifu Gao
...
Wen Wang
Siqi Zheng
Chang Zhou
Zhijie Yan
Shiliang Zhang
LLMAGVLMAuLLMLM&MA
131
87
0
07 Oct 2023
Neural Network Augmented Kalman Filter for Robust Acoustic Howling
  Suppression
Neural Network Augmented Kalman Filter for Robust Acoustic Howling Suppression
Yixuan Zhang
Junkai Wang
Mengxue Hou
Dong Yu
57
2
0
27 Sep 2023
Advancing Acoustic Howling Suppression through Recursive Training of
  Neural Networks
Advancing Acoustic Howling Suppression through Recursive Training of Neural Networks
Huatian Zhang
Yixuan Zhang
Meng Yu
Dong Yu
73
3
0
27 Sep 2023
CoMFLP: Correlation Measure based Fast Search on ASR Layer Pruning
CoMFLP: Correlation Measure based Fast Search on ASR Layer Pruning
W. Liu
Zhiyuan Peng
Tan Lee
52
2
0
21 Sep 2023
Improved Factorized Neural Transducer Model For text-only Domain
  Adaptation
Improved Factorized Neural Transducer Model For text-only Domain Adaptation
Jing Liu
Jianwei Yu
Xie Chen
87
1
0
18 Sep 2023
Unimodal Aggregation for CTC-based Speech Recognition
Unimodal Aggregation for CTC-based Speech Recognition
Ying Fang
Xiaofei Li
60
1
0
15 Sep 2023
SpatialCodec: Neural Spatial Speech Coding
SpatialCodec: Neural Spatial Speech Coding
Zhongweiyang Xu
Yong-mei Xu
Vinay Kothapally
Heming Wang
Muqiao Yang
Dong Yu
41
1
0
14 Sep 2023
CPPF: A contextual and post-processing-free model for automatic speech
  recognition
CPPF: A contextual and post-processing-free model for automatic speech recognition
Lei Zhang
Zhengkun Tian
Xiang Chen
Jiaming Sun
Hongyu Xiang
Ke Ding
Guanglu Wan
67
0
0
14 Sep 2023
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit
  for Neural Speech Codec
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Zhihao Du
Shiliang Zhang
Kai Hu
Siqi Zheng
102
63
0
14 Sep 2023
Text-Only Domain Adaptation for End-to-End Speech Recognition through
  Down-Sampling Acoustic Representation
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Jiaxu Zhu
Weinan Tong
Yaoxun Xu
Chang Song
Zhiyong Wu
Zhao You
Dan Su
Dong Yu
Helen M. Meng
63
0
0
04 Sep 2023
SememeASR: Boosting Performance of End-to-End Speech Recognition against
  Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Jiaxu Zhu
Chang Song
Zhiyong Wu
Helen Meng
VLM
61
0
0
04 Sep 2023
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Jinchuan Tian
Jianwei Yu
Hangting Chen
Brian Yan
Chao Weng
Dong Yu
Shinji Watanabe
88
1
0
19 Aug 2023
Improving Audio-Visual Speech Recognition by Lip-Subword Correlation
  Based Visual Pre-training and Cross-Modal Fusion Encoder
Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
Yusheng Dai
Hang Chen
Jun Du
xiao-ying Ding
Ning Ding
Feijun Jiang
Chin-Hui Lee
95
8
0
14 Aug 2023
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated
  Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
Fangyuan Wang
Ming Hao
Yuhai Shi
Bo Xu
MoMe
59
0
0
05 Aug 2023
Spatial-temporal Graph Based Multi-channel Speaker Verification With
  Ad-hoc Microphone Arrays
Spatial-temporal Graph Based Multi-channel Speaker Verification With Ad-hoc Microphone Arrays
Yijiang Chen
Chen Liang
Xiao-Lei Zhang
47
1
0
03 Jul 2023
Enhanced Neural Beamformer with Spatial Information for Target Speech
  Extraction
Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction
Aoqi Guo
Junnan Wu
Peng Gao
Wenbo Zhu
Qinwen Guo
Dazhi Gao
Yujun Wang
34
1
0
28 Jun 2023
A Survey on Multimodal Large Language Models
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLMLRM
135
611
0
23 Jun 2023
MAVD: The First Open Large-Scale Mandarin Audio-Visual Dataset with
  Depth Information
MAVD: The First Open Large-Scale Mandarin Audio-Visual Dataset with Depth Information
Jianrong Wang
Yuchen Huo
Li Liu
Tianyi Xu
Qi Li
Sen Li
59
3
0
04 Jun 2023
FunASR: A Fundamental End-to-End Speech Recognition Toolkit
FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Zhifu Gao
Zerui Li
Jiaming Wang
Haoneng Luo
Xian Shi
...
Yabin Li
Lingyun Zuo
Zhihao Du
Zhangyu Xiao
Shiliang Zhang
81
66
0
18 May 2023
A Lexical-aware Non-autoregressive Transformer-based ASR Model
A Lexical-aware Non-autoregressive Transformer-based ASR Model
Chong Lin
Kuan-Yu Chen
AI4TS
52
1
0
18 May 2023
Accented Text-to-Speech Synthesis with Limited Data
Accented Text-to-Speech Synthesis with Limited Data
Xuehao Zhou
Mingyang Zhang
Yi Zhou
Zhizheng Wu
Haizhou Li
71
15
0
08 May 2023
X-LLM: Bootstrapping Advanced Large Language Models by Treating
  Multi-Modalities as Foreign Languages
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Feilong Chen
Minglun Han
Haozhi Zhao
Qingyang Zhang
Jing Shi
Shuang Xu
Bo Xu
MLLM
122
126
0
07 May 2023
Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic
  Howling Suppression
Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression
Huatian Zhang
Meng Yu
Yuzhong Wu
Tao Yu
Dong Yu
60
4
0
04 May 2023
Deep Learning for Joint Acoustic Echo and Acoustic Howling Suppression
  in Hybrid Meetings
Deep Learning for Joint Acoustic Echo and Acoustic Howling Suppression in Hybrid Meetings
Huatian Zhang
Meng Yu
Dong Yu
60
2
0
02 May 2023
Improving Few-Shot Learning for Talking Face System with TTS Data
  Augmentation
Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation
Qi Chen
Ziyang Ma
Tao Liu
Xuejiao Tan
Qu Lu
Xie Chen
K. Yu
CVBM
64
5
0
09 Mar 2023
Deep AHS: A Deep Learning Approach to Acoustic Howling Suppression
Deep AHS: A Deep Learning Approach to Acoustic Howling Suppression
Huatian Zhang
Meng Yu
Dong Yu
70
9
0
18 Feb 2023
NeuralKalman: A Learnable Kalman Filter for Acoustic Echo Cancellation
NeuralKalman: A Learnable Kalman Filter for Acoustic Echo Cancellation
Yixuan Zhang
Meng Yu
Huatian Zhang
Dong Yu
DeLiang Wang
67
7
0
29 Jan 2023
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech
  Recognition
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Xiaohuan Zhou
Jiaming Wang
Zeyu Cui
Shiliang Zhang
Zhijie Yan
Jingren Zhou
Chang Zhou
93
12
0
29 Nov 2022
Deep Neural Mel-Subband Beamformer for In-car Speech Separation
Deep Neural Mel-Subband Beamformer for In-car Speech Separation
Vinay Kothapally
Yong-mei Xu
Meng Yu
Shizhong Zhang
Dong Yu
65
12
0
22 Nov 2022
Improving Noisy Student Training on Non-target Domain Data for Automatic
  Speech Recognition
Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition
Yu Chen
Wen Ding
Junjie Lai
81
9
0
09 Nov 2022
Waveform Boundary Detection for Partially Spoofed Audio
Waveform Boundary Detection for Partially Spoofed Audio
Zexin Cai
Weiqing Wang
Ming Li
48
27
0
01 Nov 2022
A context-aware knowledge transferring strategy for CTC-based ASR
A context-aware knowledge transferring strategy for CTC-based ASR
Keda Lu
Kuan-Yu Chen
56
16
0
12 Oct 2022
The Kriston AI System for the VoxCeleb Speaker Recognition Challenge
  2022
The Kriston AI System for the VoxCeleb Speaker Recognition Challenge 2022
Qutang Cai
Guoqiang Hong
Zhijian Ye
Ximin Li
Haizhou Li
119
7
0
23 Sep 2022
FRA-RIR: Fast Random Approximation of the Image-source Method
FRA-RIR: Fast Random Approximation of the Image-source Method
Yi Luo
Jianwei Yu
56
7
0
08 Aug 2022
Subband-based Generative Adversarial Network for Non-parallel
  Many-to-many Voice Conversion
Subband-based Generative Adversarial Network for Non-parallel Many-to-many Voice Conversion
Jianchun Ma
Zhedong Zheng
Hao Fei
Feng Zheng
Tat-Seng Chua
Yi Yang
GAN
61
0
0
13 Jul 2022
The HCCL System for the NIST SRE21
The HCCL System for the NIST SRE21
Zhuo Li
Runqiu Xiao
Hangting Chen
Zhenduo Zhao
Zi-qiang Zhang
Wenchao Wang
52
0
0
11 Jul 2022
Minimizing Sequential Confusion Error in Speech Command Recognition
Minimizing Sequential Confusion Error in Speech Command Recognition
Zhanheng Yang
Hang Lv
Xiong Wang
Ao Zhang
Linfu Xie
27
0
0
04 Jul 2022
Language-specific Characteristic Assistance for Code-switching Speech
  Recognition
Language-specific Characteristic Assistance for Code-switching Speech Recognition
Tongtong Song
Qiang Xu
Meng Ge
Longbiao Wang
Hao Shi
Yongjie Lv
Yuqin Lin
Jianwu Dang
72
27
0
29 Jun 2022
Paraformer: Fast and Accurate Parallel Transformer for
  Non-autoregressive End-to-End Speech Recognition
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
Zhifu Gao
Shiliang Zhang
Ian Mcloughlin
Zhijie Yan
79
108
0
16 Jun 2022
NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified
  Acoustic Echo Suppression And Speech Enhancement
NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement
Meng Yu
Yong-mei Xu
Chunlei Zhang
Shizhong Zhang
Dong Yu
50
11
0
20 May 2022
Few-Shot Speaker Identification Using Depthwise Separable Convolutional
  Network with Channel Attention
Few-Shot Speaker Identification Using Depthwise Separable Convolutional Network with Channel Attention
Yanxiong Li
Wucheng Wang
Hao Chen
Wenchang Cao
Wei Li
Qianhua He
63
5
0
24 Apr 2022
Integrating Lattice-Free MMI into End-to-End Speech Recognition
Integrating Lattice-Free MMI into End-to-End Speech Recognition
Jinchuan Tian
Jianwei Yu
Chao Weng
Yuexian Zou
Dong Yu
88
8
0
29 Mar 2022
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
Binbin Zhang
Di Wu
Zhendong Peng
Xingcheng Song
Zhuoyuan Yao
Hang Lv
Linfu Xie
Chao Yang
Fuping Pan
Jianwei Niu
VLM
92
99
0
29 Mar 2022
Improving CTC-based speech recognition via knowledge transferring from
  pre-trained language models
Improving CTC-based speech recognition via knowledge transferring from pre-trained language models
Keqi Deng
Songjun Cao
Yike Zhang
Long Ma
Gaofeng Cheng
Ji Xu
Pengyuan Zhang
28
27
0
22 Feb 2022
The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party
  meeting transcription (M2MeT) challenge
The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge
Maokui He
Xiang Lv
Weilin Zhou
Jingjing Yin
Xiaoqi Zhang
...
Shutong Niu
Yuhang Cao
Heng Lu
Jun Du
Chin-Hui Lee
82
8
0
10 Feb 2022
Previous
1234
Next