ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1412.5567
  4. Cited By
Deep Speech: Scaling up end-to-end speech recognition
v1v2 (latest)

Deep Speech: Scaling up end-to-end speech recognition

17 December 2014
Awni Y. Hannun
Carl Case
Jared Casper
Bryan Catanzaro
G. Diamos
Erich Elsen
R. Prenger
S. Satheesh
Shubho Sengupta
Adam Coates
A. Ng
ArXiv (abs)PDFHTML

Papers citing "Deep Speech: Scaling up end-to-end speech recognition"

50 / 768 papers shown
AS2T: Arbitrary Source-To-Target Adversarial Attack on Speaker
  Recognition Systems
AS2T: Arbitrary Source-To-Target Adversarial Attack on Speaker Recognition SystemsIEEE Transactions on Dependable and Secure Computing (TDSC), 2022
Guangke Chen
Zhe Zhao
Fu Song
Sen Chen
Lingling Fan
Yang Liu
AAML
178
22
0
07 Jun 2022
LegoNN: Building Modular Encoder-Decoder Models
LegoNN: Building Modular Encoder-Decoder ModelsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Siddharth Dalmia
Dmytro Okhonko
M. Lewis
Sergey Edunov
Shinji Watanabe
Florian Metze
Luke Zettlemoyer
Abdel-rahman Mohamed
AuLLMMoE
176
16
0
07 Jun 2022
Speech Augmentation Based Unsupervised Learning for Keyword Spotting
Speech Augmentation Based Unsupervised Learning for Keyword SpottingIEEE International Joint Conference on Neural Network (IJCNN), 2022
Jian Luo
Jianzong Wang
Ning Cheng
Haobin Tang
Jing Xiao
SSL
174
2
0
28 May 2022
Improving CTC-based ASR Models with Gated Interlayer Collaboration
Improving CTC-based ASR Models with Gated Interlayer CollaborationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Yuting Yang
Yuke Li
Binbin Du
318
15
0
25 May 2022
Deep Learning for Visual Speech Analysis: A Survey
Deep Learning for Visual Speech Analysis: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Changchong Sheng
Gangyao Kuang
L. Bai
Chen Hou
Yike Guo
Xin Xu
M. Pietikäinen
Tianpeng Liu
VLM
314
53
0
22 May 2022
Cardinality-Minimal Explanations for Monotonic Neural Networks
Cardinality-Minimal Explanations for Monotonic Neural NetworksInternational Joint Conference on Artificial Intelligence (IJCAI), 2022
Ouns El Harzli
Bernardo Cuenca Grau
Ian Horrocks
FAtt
279
7
0
19 May 2022
Emotion-Controllable Generalized Talking Face Generation
Emotion-Controllable Generalized Talking Face GenerationInternational Joint Conference on Artificial Intelligence (IJCAI), 2022
Sanjana Sinha
S. Biswas
Ravindra Yadav
Brojeshwar Bhowmick
CVBM
149
62
0
02 May 2022
A Novel Speech-Driven Lip-Sync Model with CNN and LSTM
A Novel Speech-Driven Lip-Sync Model with CNN and LSTM
Xiaohong Li
Xiang Wang
Kai Wang
Kai Wang
129
4
0
02 May 2022
Extricating IoT Devices from Vendor Infrastructure with Karl
Extricating IoT Devices from Vendor Infrastructure with Karl
Gina Yuan
David Mazières
Matei A. Zaharia
184
5
0
28 Apr 2022
Improving Self-Supervised Learning-based MOS Prediction Networks
Improving Self-Supervised Learning-based MOS Prediction Networks
Bálint Gyires-Tóth
Csaba Zainkó
SSL
110
1
0
23 Apr 2022
Adversarial Scratches: Deployable Attacks to CNN Classifiers
Adversarial Scratches: Deployable Attacks to CNN ClassifiersPattern Recognition (Pattern Recogn.), 2022
Loris Giulivi
Malhar Jere
Loris Rossi
F. Koushanfar
Gabriela F. Cretu-Ciocarlie
Briland Hitaj
Giacomo Boracchi
AAML
234
23
0
20 Apr 2022
STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu
  Speech using Transfer Learning, Attention, & Data Augmentation
STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, & Data Augmentation
Saad Naeem
Omer Beg
99
1
0
16 Apr 2022
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes
A Unified Cascaded Encoder ASR Model for Dynamic Model SizesInterspeech (Interspeech), 2022
Shaojin Ding
Weiran Wang
Ding Zhao
Tara N. Sainath
Yanzhang He
...
Qiao Liang
Dongseong Hwang
Ian McGraw
Rohit Prabhavalkar
Trevor Strohman
118
17
0
13 Apr 2022
A Wav2vec2-Based Experimental Study on Self-Supervised Learning Methods
  to Improve Child Speech Recognition
A Wav2vec2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech RecognitionIEEE Access (IEEE Access), 2022
Rishabh Jain
Andrei Barcovschi
Mariam Yiwere
Dan Bigioi
Peter Corcoran
H. Cucu
212
52
0
06 Apr 2022
Successes and critical failures of neural networks in capturing
  human-like speech recognition
Successes and critical failures of neural networks in capturing human-like speech recognitionNeural Networks (NN), 2022
Federico Adolfi
J. Bowers
David Poeppel
UQCV
276
26
0
06 Apr 2022
Lip to Speech Synthesis with Visual Context Attentional GAN
Lip to Speech Synthesis with Visual Context Attentional GANNeural Information Processing Systems (NeurIPS), 2022
Minsu Kim
Joanna Hong
Y. Ro
225
67
0
04 Apr 2022
Deep Speech Based End-to-End Automated Speech Recognition (ASR) for
  Indian-English Accents
Deep Speech Based End-to-End Automated Speech Recognition (ASR) for Indian-English Accents
P. Dubey
B. Shah
38
18
0
03 Apr 2022
Multi-task RNN-T with Semantic Decoder for Streamable Spoken Language
  Understanding
Multi-task RNN-T with Semantic Decoder for Streamable Spoken Language UnderstandingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Xuandi Fu
Feng-Ju Chang
Martin H. Radfar
Kailin Wei
Jing Liu
Grant P. Strimel
Kanthashree Mysore Sathyendra
131
4
0
01 Apr 2022
End-to-End Integration of Speech Recognition, Speech Enhancement, and
  Self-Supervised Learning Representation
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning RepresentationInterspeech (Interspeech), 2022
Xuankai Chang
Takashi Maekaku
Yuya Fujita
Shinji Watanabe
VLM
266
58
0
01 Apr 2022
Text-To-Speech Data Augmentation for Low Resource Speech Recognition
Text-To-Speech Data Augmentation for Low Resource Speech Recognition
Rodolfo Zevallos
130
7
0
01 Apr 2022
Memory-Efficient Training of RNN-Transducer with Sampled Softmax
Memory-Efficient Training of RNN-Transducer with Sampled SoftmaxInterspeech (Interspeech), 2022
Jaesong Lee
Lukas Lee
Shinji Watanabe
285
8
0
31 Mar 2022
An Empirical Study of Language Model Integration for Transducer based
  Speech Recognition
An Empirical Study of Language Model Integration for Transducer based Speech RecognitionInterspeech (Interspeech), 2022
Huahuan Zheng
Keyu An
Zhijian Ou
Chen Huang
Ke Ding
Guanglu Wan
194
5
0
31 Mar 2022
Is Word Error Rate a good evaluation metric for Speech Recognition in Indic Languages?
Priyanshi Shah
Harveen Singh Chadha
Anirudh Gupta
Ankur Dhuriya
Neeraj Chhimwal
Rishabh Gaur
Vivek Raghavan
170
1
0
30 Mar 2022
Improving Speech Recognition for Indic Languages using Language Model
Ankur Dhuriya
Harveen Singh Chadha
Anirudh Gupta
Priyanshi Shah
Neeraj Chhimwal
Rishabh Gaur
Vivek Raghavan
120
2
0
30 Mar 2022
4-bit Conformer with Native Quantization Aware Training for Speech
  Recognition
4-bit Conformer with Native Quantization Aware Training for Speech RecognitionInterspeech (Interspeech), 2022
Shaojin Ding
Phoenix Meadowlark
Yanzhang He
Lukasz Lew
Shivani Agrawal
Oleg Rybakov
MQ
375
44
0
29 Mar 2022
Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain
  Data
Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain DataIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Chen Chen
Nana Hou
Yuchen Hu
Shashank Shirol
Chng Eng Siong
NoLa
201
48
0
29 Mar 2022
WaveFuzz: A Clean-Label Poisoning Attack to Protect Your Voice
WaveFuzz: A Clean-Label Poisoning Attack to Protect Your Voice
Yunjie Ge
Qianqian Wang
Jingfeng Zhang
Juntao Zhou
Yunzhu Zhang
Chao Shen
AAML
224
8
0
25 Mar 2022
Learning by non-interfering feedback chemical signaling in physical
  networks
Learning by non-interfering feedback chemical signaling in physical networksPhysical Review Research (Phys. Rev. Res.), 2022
Vidyesh Rao Anisetti
B. Scellier
J. M. Schwarz
127
23
0
22 Mar 2022
Neural Predictor for Black-Box Adversarial Attacks on Speech Recognition
Neural Predictor for Black-Box Adversarial Attacks on Speech Recognition
Marie Biolková
Bac Nguyen
AAML
157
2
0
18 Mar 2022
Generalized but not Robust? Comparing the Effects of Data Modification
  Methods on Out-of-Domain Generalization and Adversarial Robustness
Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial RobustnessFindings (Findings), 2022
Tejas Gokhale
Swaroop Mishra
Man Luo
Bhavdeep Singh Sachdeva
Chitta Baral
205
33
0
15 Mar 2022
aaeCAPTCHA: The Design and Implementation of Audio Adversarial CAPTCHA
aaeCAPTCHA: The Design and Implementation of Audio Adversarial CAPTCHAEuropean Symposium on Security and Privacy (Euro S&P), 2022
Md. Imran Hossen
X. Hei
139
9
0
05 Mar 2022
A Survey of Multilingual Models for Automatic Speech Recognition
A Survey of Multilingual Models for Automatic Speech RecognitionInternational Conference on Language Resources and Evaluation (LREC), 2022
Hemant Yadav
Sunayana Sitaram
167
47
0
25 Feb 2022
Adversarial Attacks on Speech Recognition Systems for Mission-Critical
  Applications: A Survey
Adversarial Attacks on Speech Recognition Systems for Mission-Critical Applications: A Survey
Ngoc Dung Huynh
Mohamed Reda Bouadjenek
Imran Razzak
Kevin Lee
Chetan Arora
Ali Hassani
A. Zaslavsky
AAML
178
8
0
22 Feb 2022
Spanish and English Phoneme Recognition by Training on Simulated
  Classroom Audio Recordings of Collaborative Learning Environments
Spanish and English Phoneme Recognition by Training on Simulated Classroom Audio Recordings of Collaborative Learning Environments
Mario Esparza
167
0
0
21 Feb 2022
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning
  Preprocessing Pipelines
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines
Alexander Isenko
R. Mayer
Jeffrey Jedele
Hans-Arno Jacobsen
325
28
0
17 Feb 2022
Mitigating Closed-model Adversarial Examples with Bayesian Neural
  Modeling for Enhanced End-to-End Speech Recognition
Mitigating Closed-model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Chao-Han Huck Yang
Zeeshan Ahmed
Yile Gu
Joseph Szurley
Roger Ren
Linda Liu
A. Stolcke
I. Bulyko
AAML
191
4
0
17 Feb 2022
Vau da muntanialas: Energy-efficient multi-die scalable acceleration of
  RNN inference
Vau da muntanialas: Energy-efficient multi-die scalable acceleration of RNN inferenceIEEE Transactions on Circuits and Systems Part 1: Regular Papers (TCAS-I), 2021
G. Paulin
Francesco Conti
Lukas Cavigelli
Luca Benini
147
14
0
14 Feb 2022
I'm Hearing (Different) Voices: Anonymous Voices to Protect User Privacy
I'm Hearing (Different) Voices: Anonymous Voices to Protect User Privacy
H.C.M. Turner
Giulio Lovisotto
Simon Eberz
Ivan Martinovic
86
1
0
13 Feb 2022
FAAG: Fast Adversarial Audio Generation through Interactive Attack
  Optimisation
FAAG: Fast Adversarial Audio Generation through Interactive Attack Optimisation
Yuantian Miao
Chao Chen
Lei Pan
Jun Zhang
Yang Xiang
AAML
194
4
0
11 Feb 2022
Convergence of a New Learning Algorithm
Convergence of a New Learning Algorithm
Feng Lin
3DV
104
0
0
08 Feb 2022
BEA-Base: A Benchmark for ASR of Spontaneous Hungarian
BEA-Base: A Benchmark for ASR of Spontaneous HungarianInternational Conference on Language Resources and Evaluation (LREC), 2022
P. Mihajlik
A. Balog
T. E. Gráczi
A. Kohári
Balázs Tarján
K. Mády
153
9
0
01 Feb 2022
Visualizing Automatic Speech Recognition -- Means for a Better
  Understanding?
Visualizing Automatic Speech Recognition -- Means for a Better Understanding?
Karla Markert
Romain Parracone
Mykhailo Kulakov
Philip Sperl
Ching-yu Kao
Konstantin Böttinger
215
11
0
01 Feb 2022
Language Dependencies in Adversarial Attacks on Speech Recognition
  Systems
Language Dependencies in Adversarial Attacks on Speech Recognition Systems
Karla Markert
Donika Mirdita
Konstantin Böttinger
AAMLSILM
196
3
0
01 Feb 2022
Unicorn: Reasoning about Configurable System Performance through the
  lens of Causality
Unicorn: Reasoning about Configurable System Performance through the lens of CausalityEuropean Conference on Computer Systems (EuroSys), 2022
Md Shahriar Iqbal
R. Krishna
Mohammad Ali Javidian
Baishakhi Ray
Pooyan Jamshidi
LRM
237
34
0
20 Jan 2022
iDECODe: In-distribution Equivariance for Conformal Out-of-distribution
  Detection
iDECODe: In-distribution Equivariance for Conformal Out-of-distribution DetectionAAAI Conference on Artificial Intelligence (AAAI), 2022
R. Kaur
Susmit Jha
Anirban Roy
Sangdon Park
Guang Cheng
O. Sokolsky
Insup Lee
OODD
206
52
0
07 Jan 2022
Discrete and continuous representations and processing in deep learning:
  Looking forward
Discrete and continuous representations and processing in deep learning: Looking forwardAI Open (AO), 2022
Ruben Cartuyvels
Graham Spinks
Marie-Francine Moens
OCL
300
28
0
04 Jan 2022
Multi-Dialect Arabic Speech Recognition
Multi-Dialect Arabic Speech RecognitionIEEE International Joint Conference on Neural Network (IJCNN), 2020
Abbas Raza Ali
77
19
0
25 Dec 2021
Parameter identifiability of a deep feedforward ReLU neural network
Parameter identifiability of a deep feedforward ReLU neural networkMachine-mediated learning (ML), 2021
Joachim Bona-Pellissier
François Bachoc
François Malgouyres
271
20
0
24 Dec 2021
Watch Those Words: Video Falsification Detection Using Word-Conditioned
  Facial Motion
Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial MotionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
S. Agarwal
Liwen Hu
Evonne Ng
Trevor Darrell
Hao Li
Anna Rohrbach
AAML
205
25
0
21 Dec 2021
ImportantAug: a data augmentation agent for speech
ImportantAug: a data augmentation agent for speech
V. Trinh
Hassan Salami Kavaki
Michael I. Mandel
211
12
0
14 Dec 2021
Previous
123456...141516
Next