ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.08317
  4. Cited By
Paraformer: Fast and Accurate Parallel Transformer for
  Non-autoregressive End-to-End Speech Recognition
v1v2v3 (latest)

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition

16 June 2022
Zhifu Gao
Shiliang Zhang
Ian Mcloughlin
Zhijie Yan
ArXiv (abs)PDFHTML

Papers citing "Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition"

24 / 24 papers shown
Title
MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous Decoding
MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous Decoding
Yu Xi
Haoyu Li
Xiaoyu Gu
Yidi Jiang
Kai Yu
69
1
0
01 Jul 2025
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
Han Zhu
Wei Kang
Zengwei Yao
Liyong Guo
Fangjun Kuang
Zhaoqing Li
Weiji Zhuang
Long Lin
Daniel Povey
29
0
0
16 Jun 2025
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Ailin Huang
B. Li
Bruce Wang
Boyong Wu
Chao Yan
...
X. Zhang
Yibo Zhu
Daxin Jiang
Shuchang Zhou
Chen-Hao Hu
AuLLM
49
0
0
10 Jun 2025
SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models
Wenhan Yao
Fen Xiao
Xiarun Chen
Jia Liu
yongqiang He
Weiping Wen
AAMLSILM
17
0
0
10 Jun 2025
Pureformer-VC: Non-parallel Voice Conversion with Pure Stylized Transformer Blocks and Triplet Discriminative Training
Wenhan Yao
Fen Xiao
Xiarun Chen
Jia Liu
yongqiang He
Weiping Wen
17
0
0
10 Jun 2025
Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning
Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning
Yangui Fang
Jing Peng
Xu Li
Yu Xi
Chengwei Zhang
Guohui Zhong
Kai Yu
37
0
0
06 Jun 2025
Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge
Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge
Longjie Luo
Shenghui Lu
Lin Li
Q. Hong
VLM
35
0
0
30 May 2025
SuPseudo: A Pseudo-supervised Learning Method for Neural Speech Enhancement in Far-field Speech Recognition
SuPseudo: A Pseudo-supervised Learning Method for Neural Speech Enhancement in Far-field Speech Recognition
Longjie Luo
Lin Li
Q. Hong
25
0
0
30 May 2025
Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
Qixi Zheng
Yushen Chen
Zhikang Niu
Ziyang Ma
Xiaofei Wang
Kai Yu
Xie Chen
48
0
0
26 May 2025
The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition
The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition
Ming Gao
Shilong Wu
Hang Chen
Jun Du
Chin-Hui Lee
Shinji Watanabe
Jingdong Chen
Siniscalchi Sabato Marco
O. Scharenborg
68
3
0
20 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
297
1
0
05 May 2025
A Synergistic Framework of Nonlinear Acoustic Computing and Reinforcement Learning for Real-World Human-Robot Interaction
A Synergistic Framework of Nonlinear Acoustic Computing and Reinforcement Learning for Real-World Human-Robot Interaction
Xiaoliang Chen
Xin Yu
Le Chang
Yunhe Huang
Jiashuai He
...
Jin Li
Likai Lin
Ziyu Zeng
Xianling Tu
Shuyu Zhang
110
1
0
04 May 2025
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
Yanzhe Zhang
Wenxiang Guo
Changhao Pan
Zehan Zhu
Tao Jin
Zhou Zhao
VGen
128
1
0
29 Apr 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLMVLM
209
4
0
26 Feb 2025
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Wei Deng
Siyi Zhou
Jingchen Shu
Jinchao Wang
Lu Wang
VLM
102
4
0
08 Feb 2025
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
Kai-Tuo Xu
Feng-Long Xie
Xu Tang
Feng-Long Xie
154
5
0
24 Jan 2025
Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
Jiaxi Hu
Zuchao Li
Mengjia Shen
Haojun Ai
Sheng Li
Jun Zhang
82
0
0
20 Jan 2025
Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement
Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement
Qianniu Chen
Xiaoyang Hao
Yangqiu Song
Yunxing Liu
Li Lu
82
0
0
15 Jan 2025
MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics
MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics
Cong Cai
Shan Liang
Xuefei Liu
Kang Zhu
Zhengqi Wen
...
Zhenhua Cheng
Hanzhe Xu
Ruibo Fu
Bin Liu
Yongwei Li
65
3
0
17 Jul 2024
Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End
  Multi-Accent Speech Recognition
Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition
Jinming Chen
Jingyi Fang
Yuanzhong Zheng
Yaoxuan Wang
Haojun Fei
71
1
0
03 Jul 2024
Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation
  for Embedding Undetectable Vulnerabilities on Speech Recognition
Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition
Wenhan Yao
Jiangkun Yang
yongqiang He
Jia Liu
Weiping Wen
85
3
0
16 Jun 2024
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
165
48
0
21 Mar 2023
Knowledge Transfer from Pre-trained Language Models to Cif-based Speech
  Recognizers via Hierarchical Distillation
Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation
Minglun Han
Feilong Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
83
13
0
30 Jan 2023
Acoustic-aware Non-autoregressive Spell Correction with Mask Sample
  Decoding
Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Ruchao Fan
Guoli Ye
Yashesh Gaur
Jinyu Li
38
4
0
16 Oct 2022
1