Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.08317
Cited By
v1
v2
v3 (latest)
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
16 June 2022
Zhifu Gao
Shiliang Zhang
Ian Mcloughlin
Zhijie Yan
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition"
24 / 24 papers shown
Title
MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous Decoding
Yu Xi
Haoyu Li
Xiaoyu Gu
Yidi Jiang
Kai Yu
69
1
0
01 Jul 2025
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
Han Zhu
Wei Kang
Zengwei Yao
Liyong Guo
Fangjun Kuang
Zhaoqing Li
Weiji Zhuang
Long Lin
Daniel Povey
29
0
0
16 Jun 2025
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Ailin Huang
B. Li
Bruce Wang
Boyong Wu
Chao Yan
...
X. Zhang
Yibo Zhu
Daxin Jiang
Shuchang Zhou
Chen-Hao Hu
AuLLM
49
0
0
10 Jun 2025
SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models
Wenhan Yao
Fen Xiao
Xiarun Chen
Jia Liu
yongqiang He
Weiping Wen
AAML
SILM
17
0
0
10 Jun 2025
Pureformer-VC: Non-parallel Voice Conversion with Pure Stylized Transformer Blocks and Triplet Discriminative Training
Wenhan Yao
Fen Xiao
Xiarun Chen
Jia Liu
yongqiang He
Weiping Wen
17
0
0
10 Jun 2025
Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning
Yangui Fang
Jing Peng
Xu Li
Yu Xi
Chengwei Zhang
Guohui Zhong
Kai Yu
37
0
0
06 Jun 2025
Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge
Longjie Luo
Shenghui Lu
Lin Li
Q. Hong
VLM
35
0
0
30 May 2025
SuPseudo: A Pseudo-supervised Learning Method for Neural Speech Enhancement in Far-field Speech Recognition
Longjie Luo
Lin Li
Q. Hong
25
0
0
30 May 2025
Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
Qixi Zheng
Yushen Chen
Zhikang Niu
Ziyang Ma
Xiaofei Wang
Kai Yu
Xie Chen
48
0
0
26 May 2025
The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition
Ming Gao
Shilong Wu
Hang Chen
Jun Du
Chin-Hui Lee
Shinji Watanabe
Jingdong Chen
Siniscalchi Sabato Marco
O. Scharenborg
68
3
0
20 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
297
1
0
05 May 2025
A Synergistic Framework of Nonlinear Acoustic Computing and Reinforcement Learning for Real-World Human-Robot Interaction
Xiaoliang Chen
Xin Yu
Le Chang
Yunhe Huang
Jiashuai He
...
Jin Li
Likai Lin
Ziyu Zeng
Xianling Tu
Shuyu Zhang
110
1
0
04 May 2025
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
Yanzhe Zhang
Wenxiang Guo
Changhao Pan
Zehan Zhu
Tao Jin
Zhou Zhao
VGen
128
1
0
29 Apr 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
209
4
0
26 Feb 2025
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Wei Deng
Siyi Zhou
Jingchen Shu
Jinchao Wang
Lu Wang
VLM
102
4
0
08 Feb 2025
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
Kai-Tuo Xu
Feng-Long Xie
Xu Tang
Feng-Long Xie
154
5
0
24 Jan 2025
Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
Jiaxi Hu
Zuchao Li
Mengjia Shen
Haojun Ai
Sheng Li
Jun Zhang
82
0
0
20 Jan 2025
Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement
Qianniu Chen
Xiaoyang Hao
Yangqiu Song
Yunxing Liu
Li Lu
82
0
0
15 Jan 2025
MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics
Cong Cai
Shan Liang
Xuefei Liu
Kang Zhu
Zhengqi Wen
...
Zhenhua Cheng
Hanzhe Xu
Ruibo Fu
Bin Liu
Yongwei Li
65
3
0
17 Jul 2024
Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition
Jinming Chen
Jingyi Fang
Yuanzhong Zheng
Yaoxuan Wang
Haojun Fei
71
1
0
03 Jul 2024
Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition
Wenhan Yao
Jiangkun Yang
yongqiang He
Jia Liu
Weiping Wen
85
3
0
16 Jun 2024
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
165
48
0
21 Mar 2023
Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation
Minglun Han
Feilong Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
83
13
0
30 Jan 2023
Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Ruchao Fan
Guoli Ye
Yashesh Gaur
Jinyu Li
38
4
0
16 Oct 2022
1