ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1805.03294
  4. Cited By
Improved training of end-to-end attention models for speech recognition

Improved training of end-to-end attention models for speech recognition

8 May 2018
Albert Zeyer
Kazuki Irie
Ralf Schluter
Hermann Ney
    VLM
ArXivPDFHTML

Papers citing "Improved training of end-to-end attention models for speech recognition"

50 / 55 papers shown
Title
The Conformer Encoder May Reverse the Time Dimension
The Conformer Encoder May Reverse the Time Dimension
Robin Schmitt
Albert Zeyer
Mohammad Zeineldeen
Ralf Schluter
Hermann Ney
36
0
0
01 Oct 2024
MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder
MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder
Khai Le-Duc
Phuc Phan
Tan-Hanh Pham
Bach Phan Tat
Minh-Huong Ngo
Chris Ngo
Thanh Nguyen-Tang
Truong-Son Hy
LM&MA
48
0
0
21 Sep 2024
A Comparative Study on E-Branchformer vs Conformer in Speech
  Recognition, Translation, and Understanding Tasks
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Yifan Peng
Kwangyoun Kim
Felix Wu
Brian Yan
Siddhant Arora
William Chen
Jiyang Tang
Suwon Shon
Prashant Sridhar
Shinji Watanabe
29
17
0
18 May 2023
Exploring Turkish Speech Recognition via Hybrid CTC/Attention
  Architecture and Multi-feature Fusion Network
Exploring Turkish Speech Recognition via Hybrid CTC/Attention Architecture and Multi-feature Fusion Network
Zeyu Ren
Nurmemet Yolwas
Huiru Wang
Wushour Slamu
21
0
0
22 Mar 2023
UML: A Universal Monolingual Output Layer for Multilingual ASR
UML: A Universal Monolingual Output Layer for Multilingual ASR
Chaoyang Zhang
Bo-wen Li
Tara N. Sainath
Trevor Strohman
Shuo-yiin Chang
36
7
0
22 Feb 2023
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture,
  and Generalization Capabilities
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities
Andros Tjandra
Nayan Singhal
David C. Zhang
Ozlem Kalinli
Abdel-rahman Mohamed
Duc Le
M. Seltzer
37
12
0
10 Nov 2022
Minimum Latency Training of Sequence Transducers for Streaming
  End-to-End Speech Recognition
Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition
Yusuke Shinohara
Shinji Watanabe
AI4TS
23
9
0
04 Nov 2022
Phonetic-assisted Multi-Target Units Modeling for Improving
  Conformer-Transducer ASR system
Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system
Li Li
Dongxing Xu
Haoran Wei
Yanhua Long
21
2
0
03 Nov 2022
Monotonic segmental attention for automatic speech recognition
Monotonic segmental attention for automatic speech recognition
Albert Zeyer
Robin Schmitt
Wei Zhou
Ralf Schluter
Hermann Ney
16
8
0
26 Oct 2022
E-Branchformer: Branchformer with Enhanced merging for speech
  recognition
E-Branchformer: Branchformer with Enhanced merging for speech recognition
Kwangyoun Kim
Felix Wu
Yifan Peng
Jing Pan
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
61
105
0
30 Sep 2022
Branchformer: Parallel MLP-Attention Architectures to Capture Local and
  Global Context for Speech Recognition and Understanding
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Yifan Peng
Siddharth Dalmia
Ian Lane
Shinji Watanabe
30
143
0
06 Jul 2022
Minimising Biasing Word Errors for Contextual ASR with the
  Tree-Constrained Pointer Generator
Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator
Guangzhi Sun
C. Zhang
P. Woodland
32
14
0
18 May 2022
A Likelihood Ratio based Domain Adaptation Method for E2E Models
A Likelihood Ratio based Domain Adaptation Method for E2E Models
Chhavi Choudhury
Ankur Gandhe
Xiaohan Ding
I. Bulyko
27
10
0
10 Jan 2022
Perceptual Loss with Recognition Model for Single-Channel Enhancement
  and Robust ASR
Perceptual Loss with Recognition Model for Single-Channel Enhancement and Robust ASR
Peter William VanHarn Plantinga
Deblin Bagchi
Eric Fosler-Lussier
46
10
0
11 Dec 2021
Comparing the Benefit of Synthetic Training Data for Various Automatic
  Speech Recognition Architectures
Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures
Nick Rossenbach
Mohammad Zeineldeen
Benedikt Hilmes
Ralf Schluter
Hermann Ney
33
12
0
12 Apr 2021
HMM-Free Encoder Pre-Training for Streaming RNN Transducer
HMM-Free Encoder Pre-Training for Streaming RNN Transducer
Lu Huang
J. Sun
Yu Tang
Junfeng Hou
Jinkun Chen
Jun Zhang
Zejun Ma
25
3
0
02 Apr 2021
A study of latent monotonic attention variants
A study of latent monotonic attention variants
Albert Zeyer
Ralf Schluter
Hermann Ney
24
5
0
30 Mar 2021
Arabic Speech Recognition by End-to-End, Modular Systems and Human
Arabic Speech Recognition by End-to-End, Modular Systems and Human
A. Hussein
Shinji Watanabe
Ahmed M. Ali
VLM
16
47
0
21 Jan 2021
Weakly Supervised Label Smoothing
Weakly Supervised Label Smoothing
Gustavo Penha
C. Hauff
11
3
0
15 Dec 2020
Deep Shallow Fusion for RNN-T Personalization
Deep Shallow Fusion for RNN-T Personalization
Duc Le
Gil Keren
Julian Chan
Jay Mahadeokar
Christian Fuegen
M. Seltzer
21
77
0
16 Nov 2020
Semi-Supervised Learning with Data Augmentation for End-to-End ASR
Semi-Supervised Learning with Data Augmentation for End-to-End ASR
F. Weninger
F. Mana
R. Gemello
Jesús Andrés-Ferrer
P. Zhan
25
30
0
27 Jul 2020
Automatic Speech Recognition Benchmark for Air-Traffic Communications
Automatic Speech Recognition Benchmark for Air-Traffic Communications
Juan Pablo Zuluaga
P. Motlícek
Qingran Zhan
Karel Veselý
Rudolf A. Braun
15
32
0
18 Jun 2020
Imputer: Sequence Modelling via Imputation and Dynamic Programming
Imputer: Sequence Modelling via Imputation and Dynamic Programming
William Chan
Chitwan Saharia
Geoffrey E. Hinton
Mohammad Norouzi
Navdeep Jaitly
BDL
AI4TS
21
114
0
20 Feb 2020
Small energy masking for improved neural network training for end-to-end
  speech recognition
Small energy masking for improved neural network training for end-to-end speech recognition
Chanwoo Kim
Kwangyoun Kim
S. Indurthi
21
8
0
15 Feb 2020
End-to-End Automatic Speech Recognition Integrated With CTC-Based Voice
  Activity Detection
End-to-End Automatic Speech Recognition Integrated With CTC-Based Voice Activity Detection
Takenori Yoshimura
Tomoki Hayashi
K. Takeda
Shinji Watanabe
37
49
0
03 Feb 2020
Audio-Visual Decision Fusion for WFST-based and seq2seq Models
Audio-Visual Decision Fusion for WFST-based and seq2seq Models
R. Aralikatti
Sharad Roy
Abhinav Thanda
D. Margam
Pujitha Appan Kandala
Tanay Sharma
S. Venkatesan
19
1
0
29 Jan 2020
Effective Data Augmentation with Multi-Domain Learning GANs
Effective Data Augmentation with Multi-Domain Learning GANs
Shin'ya Yamaguchi
Sekitoshi Kanai
Takeharu Eda
27
27
0
25 Dec 2019
Generating Synthetic Audio Data for Attention-Based Speech Recognition
  Systems
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems
Nick Rossenbach
Albert Zeyer
Ralf Schluter
Hermann Ney
18
83
0
19 Dec 2019
Improving sequence-to-sequence speech recognition training with
  on-the-fly data augmentation
Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation
T. Nguyen
S. Stueker
Jan Niehues
A. Waibel
11
98
0
29 Oct 2019
Towards Online End-to-end Transformer Automatic Speech Recognition
Towards Online End-to-end Transformer Automatic Speech Recognition
E. Tsunoo
Yosuke Kashiwagi
Toshiyuki Kumakura
Shinji Watanabe
22
32
0
25 Oct 2019
Recognizing long-form speech using streaming end-to-end models
Recognizing long-form speech using streaming end-to-end models
A. Narayanan
Rohit Prabhavalkar
Chung-Cheng Chiu
David Rybach
Tara N. Sainath
Trevor Strohman
29
129
0
24 Oct 2019
G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR
G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR
Duc Le
T. Koehler
Christian Fuegen
M. Seltzer
30
16
0
22 Oct 2019
Transformer ASR with Contextual Block Processing
Transformer ASR with Contextual Block Processing
E. Tsunoo
Yosuke Kashiwagi
Toshiyuki Kumakura
Shinji Watanabe
59
64
0
16 Oct 2019
One-To-Many Multilingual End-to-end Speech Translation
One-To-Many Multilingual End-to-end Speech Translation
Mattia Antonino Di Gangi
Matteo Negri
Marco Turchi
33
50
0
08 Oct 2019
Self-Training for End-to-End Speech Recognition
Self-Training for End-to-End Speech Recognition
Jacob Kahn
Ann Lee
Awni Y. Hannun
SSL
27
231
0
19 Sep 2019
A Comparative Study on Transformer vs RNN in Speech Applications
A Comparative Study on Transformer vs RNN in Speech Applications
Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
Hirofumi Inaguma
...
Ryuichi Yamamoto
Xiao-fei Wang
Shinji Watanabe
Takenori Yoshimura
Wangyou Zhang
37
716
0
13 Sep 2019
Survey on Deep Neural Networks in Speech and Vision Systems
Survey on Deep Neural Networks in Speech and Vision Systems
M. Alam
Manar D. Samad
Lasitha Vidyaratne
Alexander M. Glandon
Khan M. Iftekharuddin
3DV
VLM
AI4TS
34
205
0
16 Aug 2019
IMS-Speech: A Speech to Text Tool
IMS-Speech: A Speech to Text Tool
Pavel Denisov
Ngoc Thang Vu
19
11
0
13 Aug 2019
Attention model for articulatory features detection
Attention model for articulatory features detection
I. Karaulov
Dmytro Tkanov
14
6
0
02 Jul 2019
Word-level Speech Recognition with a Letter to Word Encoder
Word-level Speech Recognition with a Letter to Word Encoder
R. Collobert
Awni Y. Hannun
Gabriel Synnaeve
3DV
22
4
0
10 Jun 2019
CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition
CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition
Linhao Dong
Bo Xu
27
125
0
27 May 2019
Acoustic-to-Word Models with Conversational Context Information
Acoustic-to-Word Models with Conversational Context Information
Suyoun Kim
Florian Metze
22
7
0
21 May 2019
Language Modeling with Deep Transformers
Language Modeling with Deep Transformers
Kazuki Irie
Albert Zeyer
Ralf Schluter
Hermann Ney
KELM
41
171
0
10 May 2019
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data
  Augmentation
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation
Christoph Luscher
Eugen Beck
Kazuki Irie
M. Kitza
Wilfried Michel
Albert Zeyer
Ralf Schluter
Hermann Ney
VLM
13
234
0
08 May 2019
Very Deep Self-Attention Networks for End-to-End Speech Recognition
Very Deep Self-Attention Networks for End-to-End Speech Recognition
Ngoc-Quan Pham
T. Nguyen
Jan Niehues
Markus Müller
Sebastian Stüker
A. Waibel
23
161
0
30 Apr 2019
A spelling correction model for end-to-end speech recognition
A spelling correction model for end-to-end speech recognition
Jinxi Guo
Tara N. Sainath
Ron J. Weiss
AuLLM
KELM
32
139
0
19 Feb 2019
Bytes are All You Need: End-to-End Multilingual Speech Recognition and
  Synthesis with Bytes
Bytes are All You Need: End-to-End Multilingual Speech Recognition and Synthesis with Bytes
Bo-wen Li
Yu Zhang
Tara N. Sainath
Yonghui Wu
William Chan
AuLLM
22
129
0
22 Nov 2018
Analysis of Multilingual Sequence-to-Sequence speech recognition systems
Analysis of Multilingual Sequence-to-Sequence speech recognition systems
Jiayang Liu
M. Baskar
Weiming Zhang
Takaaki Hori
Matthew Wiesner
Jan ''Honza'' Cernocký
33
18
0
07 Nov 2018
Transfer learning of language-independent end-to-end ASR with language
  model fusion
Transfer learning of language-independent end-to-end ASR with language model fusion
S. Hariri
Jaejin Cho
M. Baskar
Tatsuya Kawahara
R. Brunner
8
42
0
06 Nov 2018
On the End-to-End Solution to Mandarin-English Code-switching Speech
  Recognition
On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition
Zhiping Zeng
Yerbolat Khassanov
Van Tung Pham
Haihua Xu
Chng Eng Siong
Haizhou Li
16
92
0
01 Nov 2018
12
Next