v1v2 (latest)

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition

10 December 2020

Binbin Zhang

Chao Yang

Liyong Guo

Yaguang Hu

Lei Xie

X. Lei

ArXiv (abs)PDF HTML

Papers citing "Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition"

45 / 45 papers shown

Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction

370

24 Dec 2025

Chunk Based Speech Pre-training with High Resolution Finite Scalar Quantization

Yun Tang

Cindy Tseng

132

19 Sep 2025

In-domain SSL pre-training and streaming ASR

135

15 Sep 2025

Adapting Whisper for Streaming Speech Recognition via Two-Pass Decoding

...

Michele M. Franceschini

209

13 Jun 2025

Mamba for Streaming ASR Combined with Unimodal AggregationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Ying Fang

Xiaofei Li

Mamba

265

30 Sep 2024

CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR

281

14 Jul 2024

A framework of text-dependent speaker verification for chinese numerical string corpus

334

11 May 2024

Skipformer: A Skip-and-Recover Strategy for Efficient Speech RecognitionIEEE International Conference on Multimedia and Expo (ICME), 2024

363

13 Mar 2024

R-BI: Regularized Batched Inputs enhance Incremental Decoding Framework for Low-Latency Simultaneous Speech Translation

Jiaxin Guo

Shaojun Li

207

11 Jan 2024

UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error CorrectionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Jiaxin Guo

...

Min Zhang

206

11 Jan 2024

U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword BiasAutomatic Speech Recognition & Understanding (ASRU), 2023

Lei Xie

233

15 Dec 2023

CDSD: Chinese Dysarthria Speech DatabaseInterspeech (Interspeech), 2023

421

24 Oct 2023

Chunked Attention-based Encoder-Decoder Model for Streaming Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

357

15 Sep 2023

Improving CTC-AED model with integrated-CTC and auxiliary loss regularization

Daobin Zhu

Xiangdong Su

Hongbin Zhang

258

15 Aug 2023

ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging

197

05 Aug 2023

Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical StudyInternational Conference on Neural Information Processing (ICONIP), 2023

Zeping Min

Jinbo Wang

AuLLM

218

13 Jul 2023

Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech RecognitionInterspeech (Interspeech), 2023

Tianzi Wang

Shoukang Hu

Jiajun Deng

Zengrui Jin

270

27 Jun 2023

DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer ASRInterspeech (Interspeech), 2023

207

13 Jun 2023

Enhancing the Unified Streaming and Non-streaming Model with Contrastive LearningInterspeech (Interspeech), 2023

224

01 Jun 2023

Perception and Semantic Aware Regularization for Sequential Confidence CalibrationComputer Vision and Pattern Recognition (CVPR), 2023

Shuangping Huang

316

31 May 2023

DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive CodingInterspeech (Interspeech), 2023

Pengcheng Zhu

Shuai Wang

201

21 May 2023

ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMsInterspeech (Interspeech), 2023

Binbin Zhang

Zhiyong Wu

203

18 May 2023

Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

196

18 Apr 2023

Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech RecognitionNeural Networks (Neural Netw.), 2023

Leyuan Qu

C. Weber

S. Wermter

176

20 Feb 2023

E2E Segmentation in a Two-Pass Cascaded Encoder ASR ModelIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

273

28 Nov 2022

SSCFormer: Push the Limit of Chunk-wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal ConvolutionIEEE Signal Processing Letters (SPL), 2022

Fangyuan Wang

Bo Xu

342

21 Nov 2022

Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention FramesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

BinBin Zhang

161

02 Nov 2022

Delay-penalized transducer for low-latency streaming ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Wei Kang

Zengwei Yao

Fangjun Kuang

Liyong Guo

Xiaoyu Yang

Long lin

Piotr Żelasko

Daniel Povey

304

31 Oct 2022

Predicting Multi-Codebook Vector Quantization Indexes for Knowledge DistillationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Liyong Guo

Xiaoyu Yang

Quandong Wang

Yuxiang Kong

Zengwei Yao

...

Wei Kang

Long Lin

254

31 Oct 2022

Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher GuidanceIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Yuxuan Wang

256

27 Oct 2022

Linguistic-Enhanced Transformer with CTC Embedding for Speech RecognitionInternational Conference on Mobile Ad-hoc and Sensor Networks (MSN), 2022

130

25 Oct 2022

Improving Mandarin Speech Recogntion with Block-augmented Transformer

268

24 Jul 2022

Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision StrategiesInterspeech (Interspeech), 2022

246

06 Jul 2022

Language-specific Characteristic Assistance for Code-switching Speech RecognitionInterspeech (Interspeech), 2022

Tongtong Song

Qiang Xu

Meng Ge

Longbiao Wang

Hao Shi

Yongjie Lv

Yuqin Lin

Jianwu Dang

229

29 Jun 2022

Streaming non-autoregressive model for any-to-many voice conversion

Ziyi Chen

Haoran Miao

Pengyuan Zhang

159

15 Jun 2022

PaddleSpeech: An Easy-to-Use All-in-One Speech ToolkitNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

...

Dianhai Yu

159

20 May 2022

Shifted Chunk Encoder for Transformer Based Streaming End-to-End ASRInternational Conference on Neural Information Processing (ICONIP), 2022

Fangyuan Wang

Bo Xu

216

29 Mar 2022

ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

...

Bertram E. Shi

441

12 Dec 2021

Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden IntermediatesAutomatic Speech Recognition & Understanding (ASRU), 2021

266

27 Sep 2021

Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition

Xiaodan Liang

227

19 Sep 2021

SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided Adaptive MemoryACM Multimedia (ACM MM), 2021

Zhou Zhao

Xingshan Zeng

171

31 Aug 2021

Decoupling recognition and transcription in Mandarin ASR

209

02 Aug 2021

U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition

Di Wu

Binbin Zhang

Chao Yang

323

10 Jun 2021

WNARS: WFST based Non-autoregressive Streaming End-to-End Speech Recognition

187

08 Apr 2021

WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition ToolkitInterspeech (Interspeech), 2021

Binbin Zhang

Chao Yang

Lei Xie

444

313

02 Feb 2021