ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.10583
  4. Cited By
AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale
v1v2 (latest)

AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale

31 August 2018
Jiayu Du
Xingyu Na
Xuechen Liu
Hui Bu
    VLM
ArXiv (abs)PDFHTML

Papers citing "AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale"

50 / 157 papers shown
Title
MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous Decoding
MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous Decoding
Yu Xi
Haoyu Li
Xiaoyu Gu
Yidi Jiang
Kai Yu
69
1
0
01 Jul 2025
GLAP: General contrastive audio-text pretraining across domains and languages
GLAP: General contrastive audio-text pretraining across domains and languages
Heinrich Dinkel
Zhiyong Yan
Tianzi Wang
Yongqing Wang
Xingwei Sun
Yadong Niu
Jizhong Liu
Gang Li
Junbo Zhang
Jian Luan
CLIPVLM
15
0
0
12 Jun 2025
DEBATE: A Dataset for Disentangling Textual Ambiguity in Mandarin Through Speech
DEBATE: A Dataset for Disentangling Textual Ambiguity in Mandarin Through Speech
Haotian Guo
Jing Han
Yongfeng Tu
Shihao Gao
Shengfan Shen
Wulong Xiang
Weihao Gan
Zixing Zhang
12
0
0
09 Jun 2025
LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models
Wen Ding
Fan Qian
153
0
0
05 Jun 2025
Masked Self-distilled Transducer-based Keyword Spotting with Semi-autoregressive Decoding
Masked Self-distilled Transducer-based Keyword Spotting with Semi-autoregressive Decoding
Yu Xi
Xiaoyu Gu
Haoyu Li
Jun Song
Bo Zheng
Kai Yu
18
0
0
30 May 2025
Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction
Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction
Yangui Fang
Baixu Cheng
Jing Peng
Xu Li
Yu Xi
Chengwei Zhang
Guohui Zhong
27
0
0
30 May 2025
Mel-McNet: A Mel-Scale Framework for Online Multichannel Speech Enhancement
Mel-McNet: A Mel-Scale Framework for Online Multichannel Speech Enhancement
Yujie Yang
Bing Yang
Xiaofei Li
12
0
0
26 May 2025
OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Haonan Zhang
Run Luo
Xiong Liu
Yuchuan Wu
Ting-En Lin
...
Min Yang
Lianli Gao
Jingkuan Song
Fei Huang
Yongbin Li
AI4CE
86
0
0
26 May 2025
Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English
Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English
Haoyang Zhang
Hexin Liu
Xiangyu Zhang
Qiquan Zhang
Yuchen Hu
Junqi Zhao
Fei Tian
Xuerui Yang
Eng Siong Chng
Eng Siong Chng
53
0
0
20 May 2025
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Zuwei Long
Yunhang Shen
Chaoyou Fu
Heting Gao
Lijiang Li
...
Jinlong Peng
Haoyu Cao
Ke Li
Rongrong Ji
Xing Sun
78
2
0
06 May 2025
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
112
0
0
01 May 2025
Kimi-Audio Technical Report
Kimi-Audio Technical Report
KimiTeam
Ding Ding
Zeqian Ju
Yichong Leng
Shixuan Liu
...
Zhiyong Yang
Aoxiong Yin
Ruibin Yuan
Yanzhe Zhang
Zaida Zhou
AuLLMVLM
181
13
0
25 Apr 2025
CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
Nian Shao
Rui Zhou
Pengyu Wang
Xian Li
Ying Fang
Yujie Yang
Xiaofei Li
119
0
0
27 Feb 2025
Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
Che Liu
Yingji Zhang
D. Zhang
Weijie Zhang
Chenggong Gong
...
André Freitas
Qifan Wang
Z. Xu
Rongjuncheng Zhang
Yong Dai
AuLLM
240
2
0
26 Feb 2025
Audio-FLAN: A Preliminary Release
Audio-FLAN: A Preliminary Release
Liumeng Xue
Ziya Zhou
J. Pan
Zhiyu Li
Shuai Fan
...
Haohe Liu
Emmanouil Benetos
Ge Zhang
Yike Guo
Wei Xue
MLLMAuLLMCLIPVLM
93
1
0
23 Feb 2025
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
Kai-Tuo Xu
Feng-Long Xie
Xu Tang
Feng-Long Xie
154
5
0
24 Jan 2025
OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia
OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia
Xuelong Geng
Kun Wei
Qijie Shao
Shuiyun Liu
Zhennan Lin
...
Yuhang Dai
Xinfa Zhu
Yue Li
Li Zhang
Lei Xie
140
5
0
23 Jan 2025
Adaptive Data Augmentation with NaturalSpeech3 for Far-field Speaker Verification
Adaptive Data Augmentation with NaturalSpeech3 for Far-field Speaker Verification
Li Zhang
Jiyao Liu
Lei Xie
79
0
0
15 Jan 2025
Mining Word Boundaries from Speech-Text Parallel Data for Cross-domain
  Chinese Word Segmentation
Mining Word Boundaries from Speech-Text Parallel Data for Cross-domain Chinese Word Segmentation
Xuebin Wang
Lei Zhang
Zehan Li
Shilin Zhou
Chen Gong
Yang Hou
99
0
0
12 Dec 2024
OmnixR: Evaluating Omni-modality Language Models on Reasoning across
  Modalities
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities
Lawrence Yunliang Chen
Hexiang Hu
Ruotong Wang
Yiran Chen
Zifeng Wang
...
Pranav Shyam
Tianyi Zhou
Heng-Chiao Huang
Ming-Hsuan Yang
Boqing Gong
38
3
0
16 Oct 2024
Restorative Speech Enhancement: A Progressive Approach Using SE and
  Codec Modules
Restorative Speech Enhancement: A Progressive Approach Using SE and Codec Modules
Hsin-Tien Chiang
Hao Zhang
Yong Xu
Meng Yu
Dong Yu
86
1
0
02 Oct 2024
Mamba for Streaming ASR Combined with Unimodal Aggregation
Mamba for Streaming ASR Combined with Unimodal Aggregation
Ying Fang
Xiaofei Li
Mamba
52
4
0
30 Sep 2024
HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
Bingshen Mu
Kun Wei
Qijie Shao
Yong Xu
Lei Xie
MoE
114
2
0
30 Sep 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
AuLLMMLLMVLM
181
29
0
26 Sep 2024
OmniBench: Towards The Future of Universal Omni-Language Models
OmniBench: Towards The Future of Universal Omni-Language Models
Yizhi Li
Ge Zhang
Yinghao Ma
Ruibin Yuan
Kang Zhu
...
Zhaoxiang Zhang
Zachary Liu
Emmanouil Benetos
Wenhao Huang
Chenghua Lin
LRM
164
19
0
23 Sep 2024
A quest through interconnected datasets: lessons from highly-cited
  ICASSP papers
A quest through interconnected datasets: lessons from highly-cited ICASSP papers
Cynthia C. S. Liem
Doğa Taşcılar
Andrew M. Demetriou
45
0
0
19 Sep 2024
Enhancing Multilingual Speech Generation and Recognition Abilities in
  LLMs with Constructed Code-switched Data
Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data
Jing Xu
Daxin Tan
Jiaqi Wang
Xiao Chen
72
0
0
17 Sep 2024
Optimizing Dysarthria Wake-Up Word Spotting: An End-to-End Approach for
  SLT 2024 LRDWWS Challenge
Optimizing Dysarthria Wake-Up Word Spotting: An End-to-End Approach for SLT 2024 LRDWWS Challenge
Shuiyun Liu
Yuxiang Kong
Pengcheng Guo
Weiji Zhuang
Peng Gao
Yujun Wang
Lei Xie
111
0
0
16 Sep 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
112
6
0
21 Jul 2024
Qwen2-Audio Technical Report
Qwen2-Audio Technical Report
Yunfei Chu
Jin Xu
Qian Yang
Haojie Wei
Xipin Wei
...
Yuanjun Lv
Jinzheng He
Junyang Lin
Chang Zhou
Jingren Zhou
AuLLMVLM
103
162
0
15 Jul 2024
Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
Li Zhang
Ning Jiang
Qing Wang
Yuehong Li
Quan Lu
Lei Xie
60
8
0
14 Jul 2024
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based
  Speech Recognition
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Ye Bai
Jingping Chen
Jitong Chen
Wei Chen
Zhuo Chen
...
Wanyi Zhang
Yang Zhang
Yawei Zhang
Yijie Zheng
Ming Zou
AuLLM
119
28
0
05 Jul 2024
Romanization Encoding For Multilingual ASR
Romanization Encoding For Multilingual ASR
Wen Ding
Fei Jia
Hainan Xu
Yu Xi
Junjie Lai
Boris Ginsburg
60
0
0
05 Jul 2024
FunAudioLLM: Voice Understanding and Generation Foundation Models for
  Natural Interaction Between Humans and LLMs
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Keyu An
Qian Chen
Chong Deng
Zhihao Du
Changfeng Gao
...
Bin Zhang
Qinglin Zhang
Shiliang Zhang
Nan Zhao
Siqi Zheng
AuLLM
139
57
0
04 Jul 2024
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech
  Units: A Pilot Study
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study
Peikun Chen
Sining Sun
Changhao Shan
Qing Yang
Lei Xie
101
3
0
27 Jun 2024
MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of
  Transcribed Audio for Speech Recognition Research
MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research
Song Li
Yongbin You
Xuezhi Wang
Zhengkun Tian
Ke Ding
Guanglu Wan
43
3
0
26 Jun 2024
Seamless Language Expansion: Enhancing Multilingual Mastery in
  Self-Supervised Models
Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models
Jing Xu
Minglin Wu
Xixin Wu
Helen Meng
CLL
134
2
0
20 Jun 2024
Transferable speech-to-text large language model alignment module
Transferable speech-to-text large language model alignment module
Boyong Wu
Chao Yan
Haoran Pu
45
0
0
19 Jun 2024
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
Yifan Yang
Zheshu Song
Jianheng Zhuo
Mingyu Cui
Jinpeng Li
...
Shuai Fan
Kai Yu
Wei Zhang
Guoguo Chen
Xie Chen
128
12
0
17 Jun 2024
Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech
  Corpus Release and Customized System Design
Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design
Ming Gao
Hang Chen
Jun Du
Xin Xu
Hongxiao Guo
Hui Bu
Jianxing Yang
Ming Li
Chin-Hui Lee
75
2
0
14 Jun 2024
PolySpeech: Exploring Unified Multitask Speech Models for
  Competitiveness with Single-task Models
PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models
Runyan Yang
Huibao Yang
Xiqing Zhang
Tiantian Ye
Ying Liu
Yingying Gao
Shilei Zhang
Chao Deng
Junlan Feng
91
0
0
12 Jun 2024
Enhancing CTC-based speech recognition with diverse modeling units
Enhancing CTC-based speech recognition with diverse modeling units
Shiyi Han
Zhihong Lei
Mingbin Xu
Xingyu Na
Zhen Huang
83
0
0
05 Jun 2024
Generative Pre-trained Speech Language Model with Efficient Hierarchical
  Transformer
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Yongxin Zhu
Dan Su
Liqiang He
Linli Xu
Dong Yu
83
7
0
03 Jun 2024
A Survey of Multimodal Large Language Model from A Data-centric
  Perspective
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
137
45
0
26 May 2024
Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Xuelong Geng
Tianyi Xu
Kun Wei
Bingshen Mu
Hongfei Xue
...
Pengcheng Guo
Yuhang Dai
Longhao Li
Mingchen Shao
Lei Xie
82
12
0
03 May 2024
MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot
  Voice Conversion
MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
Pengcheng Li
Jianzong Wang
Xulong Zhang
Yong Zhang
Jing Xiao
Ning Cheng
DRL
77
2
0
02 May 2024
Transducers with Pronunciation-aware Embeddings for Automatic Speech
  Recognition
Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
Hainan Xu
Zhehuai Chen
Fei Jia
Boris Ginsburg
65
0
0
04 Apr 2024
Encoding of lexical tone in self-supervised models of spoken language
Encoding of lexical tone in self-supervised models of spoken language
Gaofei Shen
Michaela Watkins
Afra Alishahi
Arianna Bisazza
Grzegorz Chrupala
79
8
0
25 Mar 2024
Speech Translation with Speech Foundation Models and Large Language
  Models: What is There and What is Missing?
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
131
18
0
19 Feb 2024
E-chat: Emotion-sensitive Spoken Dialogue System with Large Language
  Models
E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models
Hongfei Xue
Yuhao Liang
Bingshen Mu
Shiliang Zhang
Mengzhe Chen
Qian Chen
Lei Xie
AuLLM
89
11
0
31 Dec 2023
1234
Next