ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.15455
  4. Cited By
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

29 March 2022
Binbin Zhang
Di Wu
Zhendong Peng
Xingcheng Song
Zhuoyuan Yao
Hang Lv
Linfu Xie
Chao Yang
Fuping Pan
Jianwei Niu
    VLM
ArXivPDFHTML

Papers citing "WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit"

50 / 50 papers shown
Title
M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper
M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper
Jiaming Zhou
S. Zhao
Jiabei He
Hui Wang
Wenjia Zeng
Yong Chen
Haoqin Sun
Aobo Kong
Yong Qin
55
1
0
13 Mar 2025
Improving Streaming Speech Recognition With Time-Shifted Contextual Attention And Dynamic Right Context Masking
Improving Streaming Speech Recognition With Time-Shifted Contextual Attention And Dynamic Right Context Masking
Khanh Le
Duc Thanh Chau
AI4TS
66
0
0
24 Feb 2025
SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech Recognition
SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech Recognition
Khanh Le
Tuan Vu Ho
Dung Tran
Duc Thanh Chau
54
0
0
20 Feb 2025
CR-CTC: Consistency regularization on CTC for improved speech recognition
CR-CTC: Consistency regularization on CTC for improved speech recognition
Zengwei Yao
Wei Kang
Xiaoyu Yang
Fangjun Kuang
Liyong Guo
Han Zhu
Zengrui Jin
Zhaoqing Li
Long Lin
Daniel Povey
53
0
0
17 Feb 2025
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
45
2
0
16 Oct 2024
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target
  Speaker Extraction
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Shuai Wang
Ke Zhang
Shaoxiong Lin
Junjie Li
Xuefei Wang
Meng Ge
Jianwei Yu
Yanmin Qian
Haizhou Li
37
8
0
24 Sep 2024
Retrieval Augmented Correction of Named Entity Speech Recognition Errors
Retrieval Augmented Correction of Named Entity Speech Recognition Errors
Ernest Pusateri
Anmol Walia
Anirudh Kashi
Bortik Bandyopadhyay
Nadia Hyder
Sayantan Mahinder
R. Anantha
Daben Liu
Sashank Gondala
RALM
3DV
26
2
0
09 Sep 2024
Lightweight Transducer Based on Frame-Level Criterion
Lightweight Transducer Based on Frame-Level Criterion
Genshun Wan
Mengzhi Wang
Tingzhi Mao
Hang Chen
Z. Ye
36
1
0
05 Sep 2024
Enhancing Code-Switching Speech Recognition with LID-Based Collaborative
  Mixture of Experts Model
Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model
Hukai Huang
Jiayan Lin
K. Wang
Yishuang Li
Wenhao Guan
Lin Li
Q. Hong
MoE
29
0
0
03 Sep 2024
HydraFormer: One Encoder For All Subsampling Rates
HydraFormer: One Encoder For All Subsampling Rates
Yaoxun Xu
Xingchen Song
Zhiyong Wu
Di Wu
Zhendong Peng
Binbin Zhang
23
0
0
08 Aug 2024
Dynamic Language Group-Based MoE: Enhancing Code-Switching Speech
  Recognition with Hierarchical Routing
Dynamic Language Group-Based MoE: Enhancing Code-Switching Speech Recognition with Hierarchical Routing
Hukai Huang
Shenghui Lu
Yahui Shan
He Qu
Wenhao Guan
Q. Hong
Lin Li
MoE
25
0
0
26 Jul 2024
Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End
  Multi-Accent Speech Recognition
Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition
Jinming Chen
Jingyi Fang
Yuanzhong Zheng
Yaoxuan Wang
Haojun Fei
21
1
0
03 Jul 2024
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech
  Units: A Pilot Study
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study
Peikun Chen
Sining Sun
Changhao Shan
Qing Yang
Lei Xie
40
2
0
27 Jun 2024
CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition
  Challenge
CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge
Chen Chen
Zehua Liu
Xiaolou Li
Lantian Li
D. Wang
27
2
0
14 Jun 2024
Enhancing CTC-based speech recognition with diverse modeling units
Enhancing CTC-based speech recognition with diverse modeling units
Shiyi Han
Zhihong Lei
Mingbin Xu
Xingyu Na
Zhen Huang
33
0
0
05 Jun 2024
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Philip Anastassiou
Jiawei Chen
J. Chen
Yuanzhe Chen
Zhuo Chen
...
Wenjie Zhang
Y. Zhang
Zilin Zhao
Dejian Zhong
Xiaobin Zhuang
49
75
0
04 Jun 2024
Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Xuelong Geng
Tianyi Xu
Kun Wei
Bingshen Mu
Hongfei Xue
...
Pengcheng Guo
Yuhang Dai
Longhao Li
Mingchen Shao
Lei Xie
36
9
0
03 May 2024
U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF
U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF
Xingchen Song
Di Wu
Binbin Zhang
Dinghao Zhou
Zhendong Peng
Bo Dang
Fuping Pan
Chao Yang
MoE
31
5
0
25 Apr 2024
JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge
  Distillation for Visual Speech Recognition
JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition
Chang Sun
Hong Yang
Bo Qin
VLM
24
1
0
04 Mar 2024
R-BI: Regularized Batched Inputs enhance Incremental Decoding Framework
  for Low-Latency Simultaneous Speech Translation
R-BI: Regularized Batched Inputs enhance Incremental Decoding Framework for Low-Latency Simultaneous Speech Translation
Jiaxin Guo
Zhanglin Wu
Zongyao Li
Hengchao Shang
Daimeng Wei
Xiaoyu Chen
Zhiqiang Rao
Shaojun Li
Hao-Yu Yang
30
1
0
11 Jan 2024
UCorrect: An Unsupervised Framework for Automatic Speech Recognition
  Error Correction
UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction
Jiaxin Guo
Minghan Wang
Xiaosong Qiao
Daimeng Wei
Hengchao Shang
...
Yinglu Li
Chang Su
Min Zhang
Shimin Tao
Hao-Yu Yang
23
6
0
11 Jan 2024
CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based
  Speech Recognition
CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based Speech Recognition
Junfeng Hou
Peiyao Wang
Jincheng Zhang
Meng-Da Yang
Minwei Feng
Jingcheng Yin
27
1
0
04 Jan 2024
Incremental FastPitch: Chunk-based High Quality Text to Speech
Incremental FastPitch: Chunk-based High Quality Text to Speech
Muyang Du
Chuan Liu
Junjie Lai
18
0
0
03 Jan 2024
Conformer-Based Speech Recognition On Extreme Edge-Computing Devices
Conformer-Based Speech Recognition On Extreme Edge-Computing Devices
Mingbin Xu
Alex Jin
Sicheng Wang
Mu Su
Tim Ng
...
Shiyi Han
Zhihong Lei
Yaqiao Deng
Zhen Huang
Mahesh Krishnamoorthy
17
4
0
16 Dec 2023
RdimKD: Generic Distillation Paradigm by Dimensionality Reduction
RdimKD: Generic Distillation Paradigm by Dimensionality Reduction
Yi Guo
Yiqian He
Xiaoyang Li
Haotong Qin
Van Tung Pham
Yang Zhang
Shouda Liu
43
1
0
14 Dec 2023
The GUA-Speech System Description for CNVSRC Challenge 2023
The GUA-Speech System Description for CNVSRC Challenge 2023
Shengqiang Li
Chao Lei
Baozhong Ma
Binbin Zhang
Fuping Pan
21
0
0
12 Dec 2023
Key Frame Mechanism For Efficient Conformer Based End-to-end Speech
  Recognition
Key Frame Mechanism For Efficient Conformer Based End-to-end Speech Recognition
Peng Fan
Changhao Shan
Sining Sun
Qing Yang
Jianwei Zhang
22
3
0
23 Oct 2023
Zipformer: A faster and better encoder for automatic speech recognition
Zipformer: A faster and better encoder for automatic speech recognition
Zengwei Yao
Liyong Guo
Xiaoyu Yang
Wei Kang
Fangjun Kuang
Yifan Yang
Zengrui Jin
Long Lin
Daniel Povey
VLM
25
64
0
17 Oct 2023
Low-latency Speech Enhancement via Speech Token Generation
Low-latency Speech Enhancement via Speech Token Generation
Huaying Xue
Xiulian Peng
Yan Lu
19
0
0
13 Oct 2023
Acoustic Model Fusion for End-to-end Speech Recognition
Acoustic Model Fusion for End-to-end Speech Recognition
Zhihong Lei
Mingbin Xu
Shiyi Han
Leo Liu
Zhen Huang
...
Yuanyuan Zhang
Ernest Pusateri
Mirko Hannemann
Yaqiao Deng
Man-Hung Siu
11
5
0
10 Oct 2023
PP-MeT: a Real-world Personalized Prompt based Meeting Transcription
  System
PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Xiang Lyu
Yuhang Cao
Qing Wang
Jingjing Yin
Yuguang Yang
Pengpeng Zou
G. Zachmann
Heng Lu
VLM
26
3
0
28 Sep 2023
Segment-Level Vectorized Beam Search Based on Partially Autoregressive
  Inference
Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference
Masao Someki
N. Eng
Yosuke Higuchi
Shinji Watanabe
13
0
0
26 Sep 2023
Cross-modal Alignment with Optimal Transport for CTC-based ASR
Cross-modal Alignment with Optimal Transport for CTC-based ASR
Xugang Lu
Peng Shen
Yu Tsao
Hisashi Kawai
22
4
0
24 Sep 2023
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for
  Speech Recognition and Understanding
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding
Titouan Parcollet
Rogier van Dalen
Shucong Zhang
S. Bhattacharya
16
6
0
12 Jul 2023
Research on an improved Conformer end-to-end Speech Recognition Model
  with R-Drop Structure
Research on an improved Conformer end-to-end Speech Recognition Model with R-Drop Structure
Weidong Ji
Shijie Zan
Guohui Zhou
Xu Wang
SyDa
14
1
0
14 Jun 2023
Adaptive Contextual Biasing for Transducer Based Streaming Speech
  Recognition
Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition
Tianyi Xu
Zhanheng Yang
Kaixun Huang
Pengcheng Guo
Aoting Zhang
Biao Li
Changru Chen
C. Li
Linfu Xie
14
10
0
01 Jun 2023
Contextualized End-to-End Speech Recognition with Contextual Phrase
  Prediction Network
Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network
Kaixun Huang
Aoting Zhang
Zhanheng Yang
Pengcheng Guo
Bingshen Mu
Tianyi Xu
Linfu Xie
24
16
0
21 May 2023
FunASR: A Fundamental End-to-End Speech Recognition Toolkit
FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Zhifu Gao
Zerui Li
Jiaming Wang
Haoneng Luo
Xian Shi
...
Yabin Li
Lingyun Zuo
Zhihao Du
Zhangyu Xiao
Shiliang Zhang
29
54
0
18 May 2023
ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs
ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs
Xingcheng Song
Di Wu
Binbin Zhang
Zhendong Peng
Bo Dang
Fuping Pan
Zhiyong Wu
27
20
0
18 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by
  Unsupervised Learning from Voice Recordings
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
19
1
0
09 May 2023
Spatio-Temporal driven Attention Graph Neural Network with Block
  Adjacency matrix (STAG-NN-BA)
Spatio-Temporal driven Attention Graph Neural Network with Block Adjacency matrix (STAG-NN-BA)
U. Nazir
W. Islam
M. Taj
16
3
0
25 Mar 2023
Contrast-PLC: Contrastive Learning for Packet Loss Concealment
Contrast-PLC: Contrastive Learning for Packet Loss Concealment
Huaying Xue
Xiulian Peng
Yan Lu
41
4
0
26 Feb 2023
SSCFormer: Push the Limit of Chunk-wise Conformer for Streaming ASR
  Using Sequentially Sampled Chunks and Chunked Causal Convolution
SSCFormer: Push the Limit of Chunk-wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution
Fangyuan Wang
Bo Xu
Bo Xu
29
0
0
21 Nov 2022
Towards A Unified Conformer Structure: from ASR to ASV Task
Towards A Unified Conformer Structure: from ASR to ASV Task
Dexin Liao
Tao Jiang
Feng Wang
Lin Li
Q. Hong
22
10
0
14 Nov 2022
Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint
  CTC/Attention Frames
Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames
Che-Yuan Liang
Xiao-Lei Zhang
BinBin Zhang
Di Wu
Shengqiang Li
Xingcheng Song
Zhendong Peng
Fuping Pan
16
8
0
02 Nov 2022
TrimTail: Low-Latency Streaming ASR with Simple but Effective
  Spectrogram-Level Length Penalty
TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty
Xingcheng Song
Di Wu
Zhiyong Wu
Binbin Zhang
Yuekai Zhang
Zhendong Peng
Wenpeng Li
Fuping Pan
Changbao Zhu
21
8
0
01 Nov 2022
Wespeaker: A Research and Production oriented Speaker Embedding Learning
  Toolkit
Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit
Hongji Wang
Che-Yuan Liang
Shuai Wang
Zhengyang Chen
Binbin Zhang
Xu Xiang
Yan Deng
Y. Qian
21
115
0
31 Oct 2022
The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR
  Challenge
The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge
Yuhao Liang
Pei-Ning Chen
F. Yu
Xinfa Zhu
Tianyi Xu
Linfu Xie
21
0
0
26 Oct 2022
Audio-to-Intent Using Acoustic-Textual Subword Representations from
  End-to-End ASR
Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR
Pranay Dighe
Prateeth Nayak
Oggi Rudovic
Erik Marchi
Xiaochuan Niu
Ahmed H. Tewfik
47
4
0
21 Oct 2022
LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge
Yan Jia
Mihee Hong
Jingyu Hou
Kailong Ren
Sifan Ma
Jin Wang
Fangzhen Peng
Yinglin Ji
Lin Yang
Junjie Wang
23
1
0
14 Oct 2022
1