ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1709.05522
  4. Cited By
AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech
  Recognition Baseline

AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline

16 September 2017
Hui Bu
Jiayu Du
Xingyu Na
Bengu Wu
Hao Zheng
    CVBM
ArXiv (abs)PDFHTML

Papers citing "AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline"

50 / 451 papers shown
Title
Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning
Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning
Yangui Fang
Jing Peng
Xu Li
Yu Xi
Chengwei Zhang
Guohui Zhong
Kai Yu
157
4
0
24 Dec 2025
Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction
Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction
Yangui Fang
Baixu Cheng
Jing Peng
Xu Li
Yu Xi
Chengwei Zhang
Guohui Zhong
265
4
0
24 Dec 2025
VibOmni: Towards Scalable Bone-conduction Speech Enhancement on Earables
VibOmni: Towards Scalable Bone-conduction Speech Enhancement on Earables
Lixing He
Yunqi Guo
Haozheng Hou
Zhenyu Yan
20
0
0
02 Dec 2025
MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark
MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark
Yuezhang Peng
Chonghao Cai
Ziang Liu
Shuai Fan
Sheng Jiang
...
Kele Xu
Y. Li
S. Wang
L. Qin
Xie Chen
AuLLM
104
0
0
01 Dec 2025
OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs
OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs
Yuting Gao
Weihao Chen
L. xilinx Wang
Ruihan Xu
Q. Guo
MoE
104
0
0
24 Nov 2025
AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert
AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert
Yuting Gao
Wang Lan
Hengyuan Zhao
Linjiang Huang
Si Liu
Q. Guo
MoE
156
0
0
23 Nov 2025
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Yunxin Li
Xinyu Chen
Shenyuan Jiang
Haoyuan Shi
Zhenyu Liu
...
Zhenran Xu
Yicheng Ma
Meishan Zhang
Baotian Hu
Min Zhang
MLLMMoEOSLMVLM
563
1
0
16 Nov 2025
LongCat-Flash-Omni Technical Report
LongCat-Flash-Omni Technical Report
M-A-P Team
Bairui Wang
Bayan
Bin Xiao
Bo Zhang
...
Xin Pan
Xin Chen
Xiusong Sun
Xu Xiang
X. Xing
MLLMVLM
510
2
0
31 Oct 2025
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Inclusion AI
Bowen Ma
Cheng Zou
C. Yan
Chunxiang Jin
...
Zhiqiang Fang
Zhihao Qiu
Ziyuan Huang
Zizheng Yang
Z. He
MLLMMoEVLM
294
2
0
28 Oct 2025
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
C. Yan
Chunxiang Jin
Dawei Huang
Haibing Yu
Han Peng
...
Yongjie Lyu
Z. He
Zhihao Qiu
Zhiqiang Fang
Ziyuan Huang
AuLLM
345
3
0
26 Oct 2025
M-CIF: Multi-Scale Alignment For CIF-Based Non-Autoregressive ASR
M-CIF: Multi-Scale Alignment For CIF-Based Non-Autoregressive ASR
Ruixiang Mao
Xiangnan Ma
Qing Yang
Ziming Zhu
Yucheng Qiao
Yuan Ge
Tong Xiao
Shengxiang Gao
Zhengtao Yu
Jingbo Zhu
84
0
0
25 Oct 2025
LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models
LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models
Xiaohan Zhao
Hongyu Xiang
Shengze Ye
Song Li
Zhengkun Tian
Guanyu Chen
Ke Ding
Guanglu Wan
AuLLM
152
1
0
17 Oct 2025
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Wenwen Tong
Hewei Guo
Dongchuan Ran
Jiangnan Chen
Jiefan Lu
...
Dinghao Zhou
Guiping Zhong
Ken Zheng
Shiyin Kang
Lewei Lu
MLLMAuLLMVGenVLM
400
4
0
15 Oct 2025
End-to-end Speech Recognition with similar length speech and text
End-to-end Speech Recognition with similar length speech and text
Peng Fan
Wenping Wang
Fei Deng
56
0
0
12 Oct 2025
LSZone: A Lightweight Spatial Information Modeling Architecture for Real-time In-car Multi-zone Speech Separation
LSZone: A Lightweight Spatial Information Modeling Architecture for Real-time In-car Multi-zone Speech Separation
Jun Chen
Shichao Hu
Jiuxin Lin
Wenjie Li
Zihan Zhang
...
Jinjiang Liu
Longshuai Xiao
Chao Weng
Lei Xie
Zhiyong Wu
102
0
0
12 Oct 2025
Drax: Speech Recognition with Discrete Flow Matching
Drax: Speech Recognition with Discrete Flow Matching
Aviv Navon
Aviv Shamsian
Neta Glazer
Yael Segal-Feldman
Gill Hetz
Joseph Keshet
Ethan Fetaya
104
0
0
05 Oct 2025
Descriptor:: Extended-Length Audio Dataset for Synthetic Voice Detection and Speaker Recognition (ELAD-SVDSR)
Descriptor:: Extended-Length Audio Dataset for Synthetic Voice Detection and Speaker Recognition (ELAD-SVDSR)
Rahul Vijaykumar
Ajan Ahmed
John Parker
Dinesh Pendyala
Aidan Collins
Stephanie Schuckers
Masudul H. Imtiaz
61
0
0
30 Sep 2025
Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis
Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis
Tianrui Wang
Haoyu Wang
Meng Ge
Cheng Gong
Chunyu Qiang
...
Xiaobao Wang
Eng Siong Chng
Xie Chen
Longbiao Wang
Jianwu Dang
135
0
0
29 Sep 2025
MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech
MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech
Chengyao Wang
Zhisheng Zhong
Bohao Peng
Senqiao Yang
Yuqi Liu
Haokun Gui
Bin Xia
Jingyao Li
Bei Yu
Jiaya Jia
MLLMAuLLMVLM
151
1
0
29 Sep 2025
Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems
Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems
Guojian Li
C. Wang
Hongfei Xue
Shuiyuan Wang
Dehui Gao
...
Yuke Lin
W. Li
Longshuai Xiao
Zhonghua Fu
Lei Xie
68
0
0
28 Sep 2025
VoiceBridge: Designing Latent Bridge Models for General Speech Restoration at Scale
VoiceBridge: Designing Latent Bridge Models for General Speech Restoration at Scale
Chi Zhang
Zehua Chen
Kaiwen Zheng
Jun Zhu
AuLLM
158
0
0
28 Sep 2025
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
Yuhan Song
Linhao Zhang
Chuhan Wu
Aiwei Liu
Wei Jia
Houfeng Wang
Xiao-bin Zhou
121
0
0
26 Sep 2025
WEST: LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
WEST: LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
Binbin Zhang
Chengdong Liang
Shuai Wang
Xuelong Geng
Zhao Guo
...
Hao Yin
XiPeng Yang
Pengshen Zhang
Changwei Ma
Lei Xie
AuLLMVLM
399
0
0
24 Sep 2025
SwissGPC v1.0 -- The Swiss German Podcasts Corpus
SwissGPC v1.0 -- The Swiss German Podcasts Corpus
Samuel Stucki
Mark Cieliebak
Jan Deriu
76
0
0
24 Sep 2025
UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition
UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition
Ying Fang
Xiaofei Li
88
0
0
18 Sep 2025
PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition
PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition
Li Fu
Yu Xin
Sunlu Zeng
Lu Fan
Youzheng Wu
Xiaodong He
99
0
0
16 Sep 2025
Fun-ASR Technical Report
Fun-ASR Technical Report
Keyu An
Yanni Chen
Chong Deng
Changfeng Gao
Zhifu Gao
...
Kun Zou
Han Zhao
Shengkui Zhao
Jingren Zhou
Yanqiao Zhu
AuLLM
234
1
0
15 Sep 2025
Enhancing the Robustness of Contextual ASR to Varying Biasing Information Volumes Through Purified Semantic Correlation Joint Modeling
Enhancing the Robustness of Contextual ASR to Varying Biasing Information Volumes Through Purified Semantic Correlation Joint ModelingIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Yue Gu
Zhihao Du
Ying Shi
Shiliang Zhang
Qian Chen
Jiqing Han
97
0
0
07 Sep 2025
New Insights into Optimal Alignment of Acoustic and Linguistic Representations for Knowledge Transfer in ASR
New Insights into Optimal Alignment of Acoustic and Linguistic Representations for Knowledge Transfer in ASR
Xugang Lu
Peng Shen
Yu Tsao
Hisashi Kawai
84
0
0
06 Sep 2025
PARCO: Phoneme-Augmented Robust Contextual ASR via Contrastive Entity Disambiguation
PARCO: Phoneme-Augmented Robust Contextual ASR via Contrastive Entity Disambiguation
Jiajun He
Naoki Sawada
Koichi Miyazaki
Tomoki Toda
212
0
0
04 Sep 2025
Contextualized Token Discrimination for Speech Search Query Correction
Contextualized Token Discrimination for Speech Search Query Correction
Junyu Lu
Di Jiang
Mengze Hong
Victor Junqiu Wei
Qintian Guo
Zhiyang Su
108
1
0
04 Sep 2025
Analysing the Language of Neural Audio Codecs
Analysing the Language of Neural Audio Codecs
J. S. Park
Shinnosuke Takamichi
David M. Chan
Shunsuke Kando
Yuki Saito
Hiroshi Saruwatari
72
0
0
01 Sep 2025
Generative Annotation for ASR Named Entity Correction
Generative Annotation for ASR Named Entity Correction
Yuanchang Luo
Daimeng Wei
Shaojun Li
Hengchao Shang
Jiaxin Guo
...
Zhanglin Wu
Xiaoyu Chen
Zhiqiang Rao
Jinlong Yang
Hao Yang
127
0
0
28 Aug 2025
Beyond Transcription: Mechanistic Interpretability in ASR
Beyond Transcription: Mechanistic Interpretability in ASR
Neta Glazer
Yael Segal-Feldman
Hilit Segev
Aviv Shamsian
Asaf Buchnick
Gill Hetz
Ethan Fetaya
Joseph Keshet
Aviv Navon
92
0
0
21 Aug 2025
Any-to-any Speaker Attribute Perturbation for Asynchronous Voice Anonymization
Any-to-any Speaker Attribute Perturbation for Asynchronous Voice AnonymizationIEEE Transactions on Information Forensics and Security (TIFS), 2025
Liping Chen
Chenyang Guo
Rui Wang
Kong Aik Lee
Zhenhua Ling
AAML
72
1
0
21 Aug 2025
MGSC: A Multi-granularity Consistency Framework for Robust End-to-end Asr
MGSC: A Multi-granularity Consistency Framework for Robust End-to-end Asr
Xuwen Yang
96
0
0
20 Aug 2025
OSUM-EChat: Enhancing End-to-End Empathetic Spoken Chatbot via Understanding-Driven Spoken Dialogue
OSUM-EChat: Enhancing End-to-End Empathetic Spoken Chatbot via Understanding-Driven Spoken Dialogue
Xuelong Geng
Qijie Shao
Hongfei Xue
Shuiyuan Wang
Hanke Xie
...
Longhao Li
Yuhang Dai
Dehui Gao
Dake Guo
Lei Xie
AuLLM
168
5
0
13 Aug 2025
Objective Soups: Multilingual Multi-Task Modeling for Speech Processing
Objective Soups: Multilingual Multi-Task Modeling for Speech Processing
A. F. M. Saif
Lisha Chen
Xiaodong Cui
Songtao Lu
Brian Kingsbury
Tianyi Chen
77
0
0
12 Aug 2025
Efficient Scaling for LLM-based ASR
Efficient Scaling for LLM-based ASR
Bingshen Mu
Yiwen Shao
Kun Wei
Dong Yu
Lei Xie
AuLLM
150
4
0
06 Aug 2025
MiDashengLM: Efficient Audio Understanding with General Audio Captions
MiDashengLM: Efficient Audio Understanding with General Audio Captions
Heinrich Dinkel
Gang Li
Jizhong Liu
Jian Luan
Yadong Niu
Xingwei Sun
Tianzi Wang
Qiyang Xiao
Junbo Zhang
Jiahao Zhou
AuLLMAI4TSVLM
342
12
0
06 Aug 2025
SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods
SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation MethodsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Wen Huang
Yanmei Gu
Zhiming Wang
Huijia Zhu
Yanmin Qian
169
4
0
29 Jul 2025
Enkidu: Universal Frequential Perturbation for Real-Time Audio Privacy Protection against Voice Deepfakes
Enkidu: Universal Frequential Perturbation for Real-Time Audio Privacy Protection against Voice Deepfakes
Zhou Feng
Jiahao Chen
Chunyi Zhou
Yuwen Pu
Qingming Li
Xuhong Zhang
S. Ji
AAML
192
5
0
17 Jul 2025
IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing
IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech ProcessingIEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2025
Zeyang Song
Shimin Zhang
Yuhong Chou
Jibin Wu
Haizhou Li
218
0
0
10 Jul 2025
ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark
ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark
He Wang
Linhan Ma
Dake Guo
Xiong Wang
Lei Xie
Jin Xu
Junyang Lin
AuLLM
225
4
0
08 Jul 2025
IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
Siyi Zhou
Yiquan Zhou
Yi He
Xun Zhou
Jinchao Wang
Wei Deng
Jingchen Shu
DiffM
163
14
0
23 Jun 2025
Weight Factorization and Centralization for Continual Learning in Speech Recognition
Weight Factorization and Centralization for Continual Learning in Speech Recognition
Enes Yavuz Ugan
Ngoc-Quan Pham
Alexander Waibel
CLLMoMe
122
2
0
19 Jun 2025
Manipulated Regions Localization For Partially Deepfake Audio: A Survey
Manipulated Regions Localization For Partially Deepfake Audio: A Survey
Jiayi He
Jiangyan Yi
Jianhua Tao
Siding Zeng
Hao Gu
173
2
0
17 Jun 2025
GLAP: General contrastive audio-text pretraining across domains and languages
GLAP: General contrastive audio-text pretraining across domains and languages
Heinrich Dinkel
Zhiyong Yan
Tianzi Wang
Yongqing Wang
Xingwei Sun
Yadong Niu
Jizhong Liu
Gang Li
Junbo Zhang
Jian Luan
CLIPVLM
195
3
0
12 Jun 2025
DEBATE: A Dataset for Disentangling Textual Ambiguity in Mandarin Through Speech
DEBATE: A Dataset for Disentangling Textual Ambiguity in Mandarin Through Speech
Haotian Guo
Jing Han
Yongfeng Tu
Shihao Gao
Shengfan Shen
Wulong Xiang
Weihao Gan
Zixing Zhang
116
0
0
09 Jun 2025
LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models Using in-the-wild Data
LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models Using in-the-wild Data
Wen Ding
Fan Qian
304
0
0
05 Jun 2025
1234...8910
Next