Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.03329
Cited By
End-to-End Speech Recognition: A Survey
3 March 2023
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"End-to-End Speech Recognition: A Survey"
29 / 29 papers shown
Title
M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper
Jiaming Zhou
S. Zhao
Jiabei He
Hui Wang
Wenjia Zeng
Yong Chen
Haoqin Sun
Aobo Kong
Yong Qin
55
1
0
13 Mar 2025
ValSub: Subsampling Validation Data to Mitigate Forgetting during ASR Personalization
Haaris Mehmood
Karthikeyan P. Saravanan
Pablo Peso Parada
David Tuckey
Mete Ozay
Gil Ho Lee
Jungin Lee
Seokyeong Jung
52
0
0
12 Mar 2025
CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking
Yiming Li
Kaiying Yan
Shuo Shao
Tongqing Zhai
Shu-Tao Xia
Z. Qin
D. Tao
AAML
68
0
0
02 Mar 2025
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation
Qiuming Zhao
Guangzhi Sun
Chao Zhang
Mingxing Xu
Thomas Fang Zheng
MoMe
VLM
71
0
0
24 Feb 2025
Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers
Adam Stooke
Rohit Prabhavalkar
K. Sim
P. M. Mengibar
26
0
0
06 Feb 2025
Classification Error Bound for Low Bayes Error Conditions in Machine Learning
Zijian Yang
Vahe Eminyan
Ralf Schluter
Hermann Ney
31
0
0
28 Jan 2025
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
Kai-Tuo Xu
Feng-Long Xie
Xu Tang
Yao Hu
66
4
0
24 Jan 2025
Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores
Jiaming Zhou
S. Zhao
Hui Wang
Tian-Hao Zhang
Haoqin Sun
Xuechen Wang
Yong Qin
148
3
0
20 Jan 2025
GEC-RAG: Improving Generative Error Correction via Retrieval-Augmented Generation for Automatic Speech Recognition Systems
Amin Robatian
Mohammad Hajipour
Mohammad Reza Peyghan
Fatemeh Rajabi
Sajjad Amini
Shahrokh Ghaemmaghami
Iman Gholampour
41
0
0
18 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
79
2
0
10 Jan 2025
Investigating the Effect of Label Topology and Training Criterion on ASR Performance and Alignment Quality
Tina Raissi
Christoph Luscher
Simon Berger
Ralf Schluter
Hermann Ney
20
2
0
16 Jul 2024
Lost in Transcription: Identifying and Quantifying the Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech
Dena F. Mujtaba
N. Mahapatra
Megan Arney
J Scott Yaruss
Hope Gerlach-Houck
Caryn Herring
Jia Bin
32
0
0
10 May 2024
Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition
O. Kundacina
V. Vincan
D. Mišković
BDL
91
0
0
03 May 2024
Semi-Autoregressive Streaming ASR With Label Context
Siddhant Arora
G. Saon
Shinji Watanabe
Brian Kingsbury
AI4TS
13
5
0
19 Sep 2023
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Jinchuan Tian
Jianwei Yu
Hangting Chen
Brian Yan
Chao Weng
Dong Yu
Shinji Watanabe
14
1
0
19 Aug 2023
DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distribution
Matías P. Pizarro
D. Kolossa
Asja Fischer
AAML
17
1
0
26 May 2023
E-Branchformer: Branchformer with Enhanced merging for speech recognition
Kwangyoun Kim
Felix Wu
Yifan Peng
Jing Pan
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
47
104
0
30 Sep 2022
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Junyi Ao
Rui Wang
Long Zhou
Chengyi Wang
Shuo Ren
...
Yu Zhang
Zhihua Wei
Yao Qian
Jinyu Li
Furu Wei
110
192
0
14 Oct 2021
CTC Variations Through New WFST Topologies
A. Laptev
Somshubra Majumdar
Boris Ginsburg
24
20
0
06 Oct 2021
MixSpeech: Data Augmentation for Low-resource Automatic Speech Recognition
Linghui Meng
Jin Xu
Xu Tan
Jindong Wang
Tao Qin
Bo Xu
VLM
62
75
0
25 Feb 2021
Intermediate Loss Regularization for CTC-based Speech Recognition
Jaesong Lee
Shinji Watanabe
111
135
0
05 Feb 2021
Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging
Rohit Prabhavalkar
Yanzhang He
David Rybach
S. Campbell
A. Narayanan
Trevor Strohman
Tara N. Sainath
41
35
0
12 Dec 2020
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition
Yangyang Shi
Yongqiang Wang
Chunyang Wu
Ching-Feng Yeh
Julian Chan
Frank Zhang
Duc Le
M. Seltzer
49
168
0
21 Oct 2020
Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition
Chao Weng
Chengzhu Yu
Jia Cui
Chunlei Zhang
Dong Yu
67
39
0
28 Nov 2019
Listening while Speaking: Speech Chain by Deep Learning
Andros Tjandra
S. Sakti
Satoshi Nakamura
AuLLM
115
165
0
16 Jul 2017
Six Challenges for Neural Machine Translation
Philipp Koehn
Rebecca Knowles
AAML
AIMat
208
1,202
0
12 Jun 2017
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,724
0
26 Sep 2016
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Y. Gal
Zoubin Ghahramani
UQCV
BDL
247
9,042
0
06 Jun 2015
Improving neural networks by preventing co-adaptation of feature detectors
Geoffrey E. Hinton
Nitish Srivastava
A. Krizhevsky
Ilya Sutskever
Ruslan Salakhutdinov
VLM
243
7,597
0
03 Jul 2012
1